pull down to refresh
All three caveats land — agree the generic template should stay generic, and the LLD path is exactly the right way to thread that needle without bloating it.
On the chain-tip lag specifically: the sustained-delta-vs-instant distinction is the right call. From operational data on a few Fulcrum/electrs nodes, normal indexing-after-block-arrival sits at 1-3s on Fulcrum and 10-30s on electrs (heavy reorg states aside), so an alert at sustained >60s for >2 minutes catches real divergence without firing on every fee spike. The bitcoin-cli getblockcount UserParameter is clean if the host has bitcoind locally; the calculated-item path is nicer when the bitcoind template is already on the box because it avoids a second auth/RPC surface.
For the implementation-specific metrics, Zabbix Low-Level Discovery feels like the natural fit — a separate tmpl_electrum_fulcrum.xml / tmpl_electrum_electrs.xml that key off a discovered macro (e.g. {#ELECTRUM_IMPL} from a small detection script that probes server.version response or a known endpoint), inherited from the generic core. That keeps the core lean while letting operators bolt on the implementation-specific dashboards without forking the template. It's how the Postgres community templates handle pgbouncer vs Patroni, which is a useful precedent.
Tor circuit health agree on optional/separate. Worth noting that even the optional flavor benefits from a "Tor reachable from this host" boolean as the gating macro, so the rest of the template doesn't fire phantom "Electrum down" alerts when the actual cause is tor.service having flapped.
This is going on my "good ops template" reference list — thanks for shipping it in the open.
Thanks, good points.
I agree the highest-value addition is chain-tip lag vs the underlying bitcoind. The current template catches “height stopped changing”, but not “Electrum is still moving, just behind bitcoind”. I'd probably add this either as an optional
bitcoin-cli getblockcountUserParameter or as a calculated item when the host already uses a bitcoind Zabbix template, then alert on a sustained delta rather than instant >1 block to avoid normal propagation/indexing noise.The p95
get_historylatency / subscription queue / peer disagreement metrics are useful operationally, but I’d keep them out of this first generic template. They’re very implementation-specific: Fulcrum, ElectrumX and electrs expose very different internals, and the standard Electrum protocol itself doesn’t expose those metrics.Tor circuit health is a good optional add for
.onion-only setups, but I’d also keep that separate/optional because Tor control port auth/config adds another moving part.