Documentation · Scoring · Migration

Hipparchus migration runbook

Hipparchus — v.H.1.0

Hipparchus rebuilds the ServerScore math from the ground up: CPU splits into 8 weighted sub-tests, RAM into quantity + speed, Network into 5 subtests with both 100% and chainweb-minimum baselines, and segregated slices inherit CPU through a ghost cache keyed on (host, cpu_model, vCPU). Every existing stamped row on disk was computed under the prior formula; this runbook is how an operator migrates the fleet forward without losing data.

When to run Force-fresh fleet rebench

Run it after every Hipparchus shipment — and before treating the ServerScores rendered on /hub/nodes as authoritative. Pre-shipment stamped rows sit on the prior formula; Force-fresh overwrites them with Hipparchus-stamped values across every visible Prime + every distinct ghost-CPU class.

Affordance. The third action card on /hub/fleet-maintenance — labelled Force-fresh fleet rebench (Action D, alongside Restamp and the legacy Rebench cards).
Confirm dialog surfaces the wall-time estimate, the per-wave plan (Wave 1 Primes → Wave 2 ghost-CPU triples → optional Wave 3 drives → Wave 4 restamp), and the per-Prime downtime statement (each Prime stops its chainweb for ~5 min during Wave 1, credited back to the warmup counter automatically).
Wave 3 is opt-in. Default off for the Hipparchus rollout — the disk subscore formula is unchanged this rehaul, so the existing stamped per-drive figures stay valid. Toggle the checkbox in the confirm dialog only when you actually want to re-bench drives (e.g. after a drive replacement on a segregated host).

Expected wall-time

The hub computes the estimate from the enumerated plan once the action is triggered. The same formula is documented here so you can sanity-check the surfaced number against your own fleet before launching:

wallMinutes = (prime_hosts   × 6 min  ÷ 8)
            + (ghost_classes × 4 min  ÷ 8)
            + (drives        × 5 min  ÷ 8)   // only when Wave 3 opted in

The constants reflect pages/api/admin/nodes/benchmark-fleet.ts: 6 min per Prime full-host bench, 4 min per ghost-CPU class bench, 5 min per drive bench, divided by the 8-way parallelism cap the job worker enforces (MAX_CONCURRENT_BY_KIND for the benchmark kinds).

Worked example. A fleet of 5 Primes + 12 distinct ghost-CPU classes (the typical shape: a handful of host machines, each hosting two or three segregated slices on the same CPU triple) with Wave 3 left at its default off:

wallMinutes = (5  × 6 ÷ 8) + (12 × 4 ÷ 8) + (0 × 5 ÷ 8)
            = 3.75       + 6.0         + 0
            ≈ 10 min

Same fleet with Wave 3 opted in (say 4 distinct drives on the segregated hosts) adds 4 × 5 ÷ 8 = 2.5 min for a total of ~12.5 min. Fleets significantly larger than this scale roughly linearly — the parallelism cap means the limiting factor is the slowest wave, not the total job count.

Mixed-formula display window

Between the moment Hipparchus deploys and the moment Force-fresh completes, /hub/nodes renders a mix of pre-Hipparchus stamped scores and Hipparchus-stamped scores. Each node renders under whatever formula its latest benchmark_runs row was stamped under; pre-shipment rows continue to display their legacy figure, freshly-rebenched rows display the new figure.

Don't draw fleet-wide conclusions during this window. Comparing a pre-Hipparchus 0.82 against a Hipparchus 0.74 is meaningless — they were produced by different formulae. Wait for the Wave 4 restamp signal (the action card surfaces a terminal-state badge once Wave 4 completes) before treating any comparison as authoritative.

The breakdown JSON carries a schema_version stamp on each row, so the underlying data distinguishes pre/post cleanly. Surface-side, the simplest discrimination is “was this row produced by the Force-fresh run that just completed?” — if yes, it's Hipparchus-stamped; if no, it's legacy.

Manual `stoa-bench` build + push

The ghost-CPU bench (Wave 2 of Force-fresh) runs inside a disposable Docker container pulled from GHCR at ghcr.io/stoachain/stoa-bench:<version>. CI automation for the build + push is not yet wired; operators rebuild and push manually on a developer machine when (and only when) the bench script or the Dockerfile base image changes.

When to rebuild. Only on bench-images/stoa-bench/bench.sh or bench-images/stoa-bench/Dockerfile changes (a new sub-test, a Debian base bump, a tooling change). Bump bench-images/stoa-bench/version in the same commit; the file is the explicit image tag the hub pulls (never :latest at bench time, to avoid stale-cache risk).

cd bench-images/stoa-bench
docker build -t ghcr.io/stoachain/stoa-bench:$(cat version) .
docker login ghcr.io -u <github-user>
docker push ghcr.io/stoachain/stoa-bench:$(cat version)
docker tag ghcr.io/stoachain/stoa-bench:$(cat version) ghcr.io/stoachain/stoa-bench:latest
docker push ghcr.io/stoachain/stoa-bench:latest

The docker login step prompts for a GitHub personal access token scoped to write:packages. Do not commit the real token into any file (or paste it into a shell-history-logged terminal). Pipe it from a password manager via --password-stdin if you prefer non-interactive auth. The :latest tag is maintained for convenience (manual debugging pulls); the hub itself always reads the explicit version from version.

What “no backfill” means

Hipparchus ships with no backfill: existing benchmark_runs rows on disk are not retroactively rescored under the new formula. The new top-level columns and subResults_json are default-null on legacy rows; the breakdown_json.schema_version stamp is the canonical pre/post discriminator.

Practical consequence: a legacy row continues to render its stamped legacy ServerScore in the UI until something overwrites it. The something is Force-fresh rebench (Wave 1 for Primes, Wave 2 for segregated slices via the ghost-CPU triple). After Force-fresh completes for a node, its row carries a Hipparchus-stamped score; before, it carries a legacy score.

No schema-side automatic re-evaluation. A restamp pass without a fresh bench cannot fabricate the missing raw sub-test measurements (the 8-CPU sub-tests, the new RAM-speed dimension, the 5-subtest network split). A fresh bench is the only way to produce them.
No silent overwrite. Force-fresh is operator-initiated; the hub never re-benches autonomously. You decide when the fleet transitions.
No legacy-data destruction. The migrations that landed the new columns + tables (077 + 078 + the ghost-cpu cache table) only add structure — never DROP. Rolling back to v.G.1.4* against the same SQLite file leaves every legacy row intact.

Operators wanting a uniform, fleet-wide Hipparchus picture must run Force-fresh once. Operators wanting to leave specific nodes on their legacy stamp can simply not include them in the rebench window — but the practical workflow is “run Force-fresh once, then forget about it.”