Documentation Β· Scoring Β· Migration

Hipparchus migration runbook

Hipparchus β€” v.H.1.0

Hipparchus rebuilds the ServerScore math from the ground up: CPU splits into 8 weighted sub-tests, RAM into quantity + speed, Network into 5 subtests with both 100% and chainweb-minimum baselines, and segregated slices inherit CPU through a ghost cache keyed on (host, cpu_model, vCPU). Every existing stamped row on disk was computed under the prior formula; this runbook is how an operator migrates the fleet forward without losing data.

When to run Force-fresh fleet rebench

Run it after every Hipparchus shipment β€” and before treating the ServerScores rendered on /hub/nodes as authoritative. Pre-shipment stamped rows sit on the prior formula; Force-fresh overwrites them with Hipparchus-stamped values across every visible Prime + every distinct ghost-CPU class.

  • Affordance. The third action card on /hub/fleet-maintenance β€” labelled Force-fresh fleet rebench (Action D, alongside Restamp and the legacy Rebench cards).
  • Confirm dialog surfaces the wall-time estimate, the per-wave plan (Wave 1 Primes β†’ Wave 2 ghost-CPU triples β†’ optional Wave 3 drives β†’ Wave 4 restamp), and the per-Prime downtime statement (each Prime stops its chainweb for ~5 min during Wave 1, credited back to the warmup counter automatically).
  • Wave 3 is opt-in. Default off for the Hipparchus rollout β€” the disk subscore formula is unchanged this rehaul, so the existing stamped per-drive figures stay valid. Toggle the checkbox in the confirm dialog only when you actually want to re-bench drives (e.g. after a drive replacement on a segregated host).

Expected wall-time

The hub computes the estimate from the enumerated plan once the action is triggered. The same formula is documented here so you can sanity-check the surfaced number against your own fleet before launching:

wallMinutes = (prime_hosts   Γ— 6 min  Γ· 8)
            + (ghost_classes Γ— 4 min  Γ· 8)
            + (drives        Γ— 5 min  Γ· 8)   // only when Wave 3 opted in

The constants reflect pages/api/admin/nodes/benchmark-fleet.ts: 6 min per Prime full-host bench, 4 min per ghost-CPU class bench, 5 min per drive bench, divided by the 8-way parallelism cap the job worker enforces (MAX_CONCURRENT_BY_KIND for the benchmark kinds).

Worked example. A fleet of 5 Primes + 12 distinct ghost-CPU classes (the typical shape: a handful of host machines, each hosting two or three segregated slices on the same CPU triple) with Wave 3 left at its default off:

wallMinutes = (5  Γ— 6 Γ· 8) + (12 Γ— 4 Γ· 8) + (0 Γ— 5 Γ· 8)
            = 3.75       + 6.0         + 0
            β‰ˆ 10 min

Same fleet with Wave 3 opted in (say 4 distinct drives on the segregated hosts) adds 4 Γ— 5 Γ· 8 = 2.5 min for a total of ~12.5 min. Fleets significantly larger than this scale roughly linearly β€” the parallelism cap means the limiting factor is the slowest wave, not the total job count.

Mixed-formula display window

Between the moment Hipparchus deploys and the moment Force-fresh completes, /hub/nodes renders a mix of pre-Hipparchus stamped scores and Hipparchus-stamped scores. Each node renders under whatever formula its latest benchmark_runs row was stamped under; pre-shipment rows continue to display their legacy figure, freshly-rebenched rows display the new figure.

Don't draw fleet-wide conclusions during this window. Comparing a pre-Hipparchus 0.82 against a Hipparchus 0.74 is meaningless β€” they were produced by different formulae. Wait for the Wave 4 restamp signal (the action card surfaces a terminal-state badge once Wave 4 completes) before treating any comparison as authoritative.

The breakdown JSON carries a schema_version stamp on each row, so the underlying data distinguishes pre/post cleanly. Surface-side, the simplest discrimination is β€œwas this row produced by the Force-fresh run that just completed?” β€” if yes, it's Hipparchus-stamped; if no, it's legacy.

Manual stoa-bench build + push

The ghost-CPU bench (Wave 2 of Force-fresh) runs inside a disposable Docker container pulled from GHCR at ghcr.io/stoachain/stoa-bench:<version>. CI automation for the build + push is not yet wired; operators rebuild and push manually on a developer machine when (and only when) the bench script or the Dockerfile base image changes.

When to rebuild. Only on bench-images/stoa-bench/bench.sh or bench-images/stoa-bench/Dockerfile changes (a new sub-test, a Debian base bump, a tooling change). Bump bench-images/stoa-bench/version in the same commit; the file is the explicit image tag the hub pulls (never :latest at bench time, to avoid stale-cache risk).

cd bench-images/stoa-bench
docker build -t ghcr.io/stoachain/stoa-bench:$(cat version) .
docker login ghcr.io -u <github-user>
docker push ghcr.io/stoachain/stoa-bench:$(cat version)
docker tag ghcr.io/stoachain/stoa-bench:$(cat version) ghcr.io/stoachain/stoa-bench:latest
docker push ghcr.io/stoachain/stoa-bench:latest

The docker login step prompts for a GitHub personal access token scoped to write:packages. Do not commit the real token into any file (or paste it into a shell-history-logged terminal). Pipe it from a password manager via --password-stdin if you prefer non-interactive auth. The :latest tag is maintained for convenience (manual debugging pulls); the hub itself always reads the explicit version from version.

What β€œno backfill” means

Hipparchus ships with no backfill: existing benchmark_runs rows on disk are not retroactively rescored under the new formula. The new top-level columns and subResults_json are default-null on legacy rows; the breakdown_json.schema_version stamp is the canonical pre/post discriminator.

Practical consequence: a legacy row continues to render its stamped legacy ServerScore in the UI until something overwrites it. The something is Force-fresh rebench (Wave 1 for Primes, Wave 2 for segregated slices via the ghost-CPU triple). After Force-fresh completes for a node, its row carries a Hipparchus-stamped score; before, it carries a legacy score.

  • No schema-side automatic re-evaluation. A restamp pass without a fresh bench cannot fabricate the missing raw sub-test measurements (the 8-CPU sub-tests, the new RAM-speed dimension, the 5-subtest network split). A fresh bench is the only way to produce them.
  • No silent overwrite. Force-fresh is operator-initiated; the hub never re-benches autonomously. You decide when the fleet transitions.
  • No legacy-data destruction. The migrations that landed the new columns + tables (077 + 078 + the ghost-cpu cache table) only add structure β€” never DROP. Rolling back to v.G.1.4* against the same SQLite file leaves every legacy row intact.

Operators wanting a uniform, fleet-wide Hipparchus picture must run Force-fresh once. Operators wanting to leave specific nodes on their legacy stamp can simply not include them in the rebench window β€” but the practical workflow is β€œrun Force-fresh once, then forget about it.”

Further reading

  • /docs/scoring β€” the master scoring chapter (weights table, 100% baselines, chainweb-min baselines, reuse matrix, ghost-cache rule, distance thresholds).
  • /docs/scoring/network β€” the network subscore page (5-subtest table, Linode region list, upload-sink endpoint, multi-hub forward-compat note).
  • /docs/releases/hipparchus β€” the per-codename release page (operator-visible diff for H.1.0).