Skip to main content

Monitoring and Alerting

A production validator needs continuous monitoring, not just one-off diagnosis. This page covers the ongoing metrics-and-alerts setup; for one-time profiling of a slow node, see Performance and Profiling.

Prometheus metrics

CometBFT exposes Prometheus metrics. Enable them in config.toml:

[instrumentation]
prometheus = true
prometheus_listen_addr = ":26660"

The node then serves metrics at that port for a Prometheus scraper. From there, Grafana dashboards visualize them and Alertmanager (or your alerting stack) fires on thresholds.

What to alert on

The signals that matter most for a validator:

  • Missed blocks / not signing. The earliest warning that something is wrong; a sustained miss leads to downtime slashing.
  • Block height stalled or falling behind peers. The node is stuck or out of sync.
  • Peer count dropping toward zero. Networking or connectivity failure.
  • Disk filling up. A pruned node still grows; running out of disk halts the node.
  • Sentry/validator connectivity (if using a sentry architecture); see Validator Security.

Reference

  • One-time profiling and diagnosis: Performance and Profiling.
  • CometBFT instrumentation: the [instrumentation] section of config.toml.