Background work β SSH probes, benchmarks, installs, backups, certbot obtains, seed refreshes, scoring ticks, tip polls β runs through a single job queue owned by the worker process. Everything is resumable, audit-logged, and visible on /admin/jobs.
Worker lease
Only one worker at a time executes jobs. The lease lives in worker_lease in the SQLite DB and is renewed every ~5 seconds by the current leader. If the leader disappears (crash, deploy), a replacement acquires the stale lease and carries on. Multiple worker instances can stand by without conflict.
Parallel job execution
Inside the leader, up to MAX_PARALLEL_JOBS = 32 jobs run concurrently. A per-node lock stops two jobs touching the same target at once; per-kind caps stop one kind (e.g. backup-stoachain) saturating every slot. Different nodes make independent progress; a 6-minute benchmark on node A doesnβt block a quick probe on node B.
Job kinds
Handlers are registered in lib/handlers/registry. Currently 15 kinds:
- apt-upgrade
- backup-stoachain
- benchmark-node
- drive-benchmark
- netdata-install
- node-test
- peer-trust-reset
- seed-refresh
- stoachain-cert-rotate
- stoachain-certbot-obtain
- stoachain-control
- stoachain-convert-supervision
- stoachain-install
- stoachain-reseed
- system-probe
Background ticks (not job-queued)
Some work isnβt queued β it runs directly in the worker loop on throttled cadences:
- Chainweb tip pollerβ every 30 s, 8-way concurrency, SSH-probes every nodeβs cut height + writes to
node_chainweb_tip. - Scoring tick β every 60 s, runs the 7-gate eligibility engine for every node.
- Rich-list materialisation β hourly, refreshes the
rich_list_mvtable. - Daily integer mint β once per UTC day at 06:00, sweeps
Current β Redeemedfor every account. - Hub-scores nightly backup β once per UTC day after 03:00.
- Job sweep β hourly, drops completed jobs older than 30 days.
SSH connection pool
As of v0.7.8z14, runRemote is backed by a pool keyed by user@host:port. Idle TTL 5 min, max connection age 1 h, reaper on a 60 s interval. Multi-channel ssh2 lets concurrent exec calls share one connection β no handshake overhead per call.
Where this grows
T3 (β₯ 350 active nodes) adds composite indexes on the hot paths + moves job logs out of the DB into flat files. T4 swaps the in-process queue for BullMQ and the SQLite state for Postgres. See Β§11 Scaling plan.