Documentation · Releases · Argus

Argus — H.1.2

v.Chronos.1.Argus — legacy v.H.1.2

A single read-only panel that watches the entire bench-versioning system at once — runtime constants, pinned snapshot, suggested next version, and the fleet.

Argus is the admin oversight surface for the bench-versioning infrastructure that Pheidippides v.H.1.1nd put in place. The drift detector compares the live `BASELINE_*` / `WEIGHT_*` exports from `lib/scoring-formula.ts` + `lib/stoic-power.ts` against the `PINNED_SCORING_CONSTANTS` snapshot in `lib/bench-version.ts` and, when they disagree, suggests the next SemVer-compliant `BENCH_VERSION`. MAJOR when any `WEIGHT_*` value moved or a constant was added/removed (scoring scale shifted); MINOR when only `BASELINE_*` values moved (recalibration); PATCH for the defensive corner where only the snapshot hash drifted. The panel is a SUGGESTION engine — no DB write, no auto-bump button. The operator greenlights a bump by applying the suggested code edits manually (BENCH_VERSION + PINNED_SCORING_CONSTANTS entry + BENCH_VERSION_SNAPSHOT_HASH, all in a single commit). The drift-guard CI test rejects a hash mismatch, forcing the three movers to stay in lockstep. The 'version stamp == compiled code' invariant set by Pheidippides stays intact: no runtime override path, no operator-induced drift.

Why “Argus”

Argus Panoptes — the all-seeing giant of Greek myth — had a hundred eyes, of which only a few ever slept at any one moment. Hera assigned him to guard Io: he could watch from every angle simultaneously, and nothing entered or left his field of view without being noticed. The codename captures what this admin surface is for: a single read-only panel that watches every part of the bench-versioning system at once — live runtime constants, pinned snapshot, suggested bumps, fleet distribution, stale entities — so the operator never has to discover drift by accident.

Headline changes

Live-vs-pinned drift detector. The new /api/admin/bench-versioning/drift endpoint collects every BASELINE_* / WEIGHT_* export from lib/scoring-formula.ts + lib/stoic-power.ts at request time, hashes them alongside the current BENCH_VERSION using the same recipe the drift-guard CI test uses (sha256, truncated to 16 hex chars), and runs computeDriftReport to diff them against the pinned snapshot. The response surfaces a per-constant change table with signed percent delta and per-row classification.
Suggested next BENCH_VERSION. The drift report carries a suggested next version computed from the highest-severity change. Any WEIGHT_* change suggests MAJOR (scoring scale shifted). Any BASELINE_* change suggests MINOR (recalibration). A bare hash difference suggests PATCH. An added or removed constant counts as MAJOR (the scoring surface itself grew or shrank). The version is computed bybumpSemver() — increments the appropriate component, zeroes the lower ones.
Fleet-wide version distribution. The new /api/admin/bench-versioning/fleet endpoint mines nodes.server_score_breakdown_json and virtual_containers.last_breakdown_json for the stamped score.benchVersion field, groups by version, and produces a distribution table with per-version count, percentage, and mismatch class. Pre-versioning rows (stamped before v.H.1.1nd) are bucketed as (pre-versioning) so the operator sees them as a distinct group rather than aggregated into nothing.
Stale-entities list, oldest first. Up to 20 entities on non-current bench versions, sorted by last_benched_at ascending. Each row carries its id, label, kind (real-node / virtual-prime / virtual-slave), stamped version, last bench timestamp, and mismatch class. The operator uses this to prioritise rebench targets after a BENCH_VERSION bump.
Ancient-admin-gated oversight page. /hub/bench-versioning renders the four cards (current state, suggestion, fleet distribution, stale entities). Ancient-admin only — same gate as the other system-level surfaces. Manual refresh button rather than a polling cadence, since both data sources only change on explicit operator action (deploy or bench).
Production-side pinning of the snapshot hash. The drift-guard CI test previously held the snapshot hash in its own constant; Argus lifts both BENCH_VERSION_SNAPSHOT_HASH and PINNED_SCORING_CONSTANTS into lib/bench-version.ts so a runtime endpoint can import them. The drift-guard test now reads from production — single source of truth — and adds a new assertion that the pinned map matches the live runtime exports key-for-key and value-for-value at test time, so the bump ceremony stays an atomic three-mover.

The greenlight invariant

Argus is a suggestion engine. There is no DB write, no auto-bump button, no server-stored “next version” record. When the panel surfaces a suggested bump, the operator greenlights it by applying three coordinated edits in lib/bench-version.ts within a single commit:

Bump BENCH_VERSION to the suggested value.
Update the relevant rows in PINNED_SCORING_CONSTANTS to the new live values.
Paste the new snapshot hash (printed by the drift-guard test on first re-run after the BENCH_VERSION bump) into BENCH_VERSION_SNAPSHOT_HASH.

The drift-guard CI test rejects any hash mismatch between the live constants and the pinned snapshot, so the three movers are forced to land in the same commit or CI goes red. This preserves the “version stamp == compiled code” invariant set by Pheidippides v.H.1.1nd: every stamped benchVersion in the database can be matched exactly to a checked-in commit, with no runtime override path and no possibility of operator-induced drift between what the constant says and what the math actually computes.

For operators

The base Argus release was a pure observation surface — it did not change how a bench runs or how a score is computed. The patch series below (v.H.1.2a–k) extended it into the operational hub for fleet rebenching and fixed several scoring and display bugs surfaced by operators running real rebenches through it, including a per-tile CPU subscore that rendered 0.000 in the Container-Score detail panel even when the underlying score was correct. Argus still tells you what state the bench-versioning system is in and what edits to make if it has drifted; you can continue to bump BENCH_VERSION manually by editing the constants directly.

The four cards on /hub/bench-versioning — current state, suggestion (only on drift), fleet distribution, stale entities — share a single refresh button. No polling cadence, no auto-refresh; both data sources only change after an explicit deploy or an explicit bench, so the panel renders whatever was true the last time the operator pressed Refresh.

Patch series — what changed after the base release

Argus shipped as a read-only oversight panel and then grew, through the v.H.1.2a–k patch series, into the operator’s working surface for fleet rebenching. The current operator-facing reality at v.H.1.2k:

Entities split by kind, paginated. The single stale-entities list was replaced by separate “Real entities” and “Virtual entities” cards. Real entities are the chainweb-bearing rows (full hosts running chainweb + segregated chainweb children); segregated children show their cluster-id as the display name. A (prime) chip marks top-level rows that run chainweb directly.
Two-tier bench version + Greenlight. The panel distinguishes the internal (code) BENCH_VERSION from the live (greenlit) version stored in the database. A bench is only graded gold once an admin presses the Greenlight button to promote the version into production; pre-greenlight benches stamp NULL and render as (pre-versioning).
Transitive rebench. Force-rebenching a red segregated slice now cascades to its inherited sources — the drive bench (always) and the parent host bench (only when the parent itself is red) — so an inherited red value can’t silently persist.
Partial scores surface instead of “missing score”. A chainweb child with a real partial score sourced from its inherited drive bench + parent net + provision (no slice bench of its own) now shows that score, color-graded red for incompleteness, rather than rendering as missing.
Name is the link; live “Time since last benched” column (v.H.1.2k). The standalone Action/Open column was removed — the entity name itself links to the node detail page. A new “Time since last benched” column shows a granular relative timer (days / hours / minutes — no seconds) that advances in real time. For segregated children, whose own last_benched_atis NULL by design, the timestamp is taken from the newest of (slice bench, the child’s drive bench, the parent host’s ghost-CPU cache) so the column is never blank for an entity that has a score.
CPU subscore detail-tile fix (v.H.1.2k). The per-tile CPU subscore in the Container-Score detail panel read the legacy slice column, which the Hipparchus slice handler intentionally leaves NULL (slice CPU is delegated to the per-CPU-class ghost cache). It rendered 0.000 even though the headline score correctly counted ghost-cache CPU. The detail tile now reads the same ghost-CPU cache the headline score uses.
Score-color consistency between Argus and the per-node panel (v.H.1.2l). Argus graded a chainweb child’s bench-version stamp by reading nodes.server_score_breakdown_json, which is always NULL for segregated children (their bench data lives in segregated_slice_benchmarks) — so every child rendered red regardless of its real, current slice stamp, while the per-node benchmark panel (which reads the slice stamp directly) showed it gold. Argus now sources the version stamp for segregated children from the newest slice bench, the same value the per-node card uses, so the two surfaces agree. A child with no slice bench of its own stays red — the correct “incompletely benched” signal.
“Last benched” no longer over-attributes shared-resource benches (v.H.1.2m). A chainweb child’s “Last benched” is now its own slice bench. Previously it was the newest of (own slice, drive bench, parent ghost-CPU cache) — but the drive bench is shared by every child on the drive and the ghost-CPU cache is shared by every child of the parent host, so rebenching one child made all its siblings falsely report the same recent time. A child with no slice bench of its own still falls back to the inherited date so it isn’t blank. With this, Argus and the per-node panel agree on score, version-color, and date for every entity.
Spurious “benchmark-host-drive failed” on slice rebench fixed (v.H.1.2n). Force-rebenching a chainweb child cascaded a drive rebench that was enqueued without the host id the drive-bench handler requires, so it failed instantly and showed up as a failed job — even though the real drive bench (enqueued by the rebench action itself) succeeded in parallel. The cascade now passes the owning host and de-dups against an in-flight drive bench, so there is no spurious failure and no double drive bench. Two identically-provisioned chainweb children on the same host and drive getting the same score is expected (the score measures capacity; identical hardware ⇒ identical capacity), not a defect.

Pheidippides — H.1.1 — the immediately preceding arc; transport rehaul plus the v.H.1.1nd versioning infrastructure Argus builds on (theBENCH_VERSION constant, the mismatch classifier, the drift-guard CI test).
/docs/scoring — the canonical scoring documentation, untouched by H.1.2 since the math did not change.