Documentation · Releases · Pheidippides
Pheidippides — H.1.1
v.Chronos.1.Pheidippides — legacy v.H.1.1
The bench runs alone on the remote and reports back at the end — no need to keep an SSH session alive for the whole 3–5 minutes.
Pheidippides is the bench transport rehaul. The pre-H.1.1 hub opened a single long-lived SSH session to the bench target and streamed the entire 3–5 minute bench's stdout back over it. That made the bench's success contingent on a residential NAT, a home router, or any other intermediate network layer keeping the SSH connection alive for the full duration — a contract many networks silently broke. H.1.1 flips the model: SSH carries only a ~10 second kickoff command (curl the bench script from the hub via HTTPS, launch it under `nohup setsid` so SIGHUP cannot kill it, return immediately). The remote bench then runs detached, posting per-phase progress events to the hub via HTTPS as it crosses each phase boundary, and a final finalize POST carrying the complete bench log when it exits. The hub-side handler waits on the database for the finalize event and feeds the decoded log into the existing score-synthesis pipeline. Score math is unchanged; only the transport flipped.
Why “Pheidippides”
Pheidippides (~490 BCE) was the Athenian hemerodromos — long-distance messenger — who ran from Marathon to Athens to deliver news of the Persian defeat. The legend captures the shape of the new bench architecture: the runner sets out alone, completes the distance autonomously, and reports back at the end. No one is required to keep watching him the whole way for the message to arrive. H.1.1 gives the bench the same property: the hub kicks off the work and walks away; the remote completes the bench on its own schedule; the result arrives via a separate channel when ready.
Headline changes
- SSH session lifetime decoupled from bench duration. A 3–5 minute bench used to require a 3–5 minute SSH session. It now uses a ~5–10 second SSH session for kickoff, then SSH closes and the bench runs autonomously on the remote. NAT timeouts, ssh2 internals, or residential link quirks during the bench’s execution no longer interrupt the bench.
- Bench script fetched via HTTPS. The hub mints a per-bench token, stores the rendered bench script under it, and the remote curls
/api/bench-script/<token>via HTTPS as part of kickoff. The 32 KB bench-script body no longer rides the SSH exec command line, which closes a long-standing class of SSH-protocol-level failure modes. - Push-based progress reporting. The bench script POSTs a progress event to
/api/bench/progressafter every phase boundary (CPU sub-tests, disk, network sub-tests, RAM, commitment). The hub stores these inbench_progress_events, deduplicated on(job_id, sequence)for retry idempotency. Network blips during the bench cost a single event retry instead of the whole bench. - Finalize POST carries the complete log. When the bench script exits, its EXIT trap base64-encodes the full bench log (capped at 192 KB raw) and POSTs it to
/api/bench/finalizealong with the exit code. The hub decodes this and feeds it into the existing score-synthesis pipeline. The score numbers are computed from the same bytes the previous architecture would have streamed — only the delivery channel changed. - Bench survives SSH closing. The kickoff launches the bench under
nohup setsid bash … > /tmp/log 2>&1 < /dev/null &.setsidmoves the bench into a new process group so the SIGHUP that fires when SSH’s controlling-tty disappears cannot reach it.nohupis the belt-and-suspenders backup. Stdin redirected from/dev/nullensures the bench cannot accidentally block on a read.
For operators
The bench UI works the same way it did before — every existing per-step detail panel, every dimension tile, every Stoicism Eligibility number continues to render from the same backend shapes. The change is invisible at the operator surface; the only thing you may notice is that benches on residential-NAT hosts now complete cleanly with all CPU sub-tests filled in, where previously they would land with several EBENCH_PARSE red flags from the SSH stream dying mid-bench.
Per-bench audit trail is in two tables: bench_job_tokens (one row per bench, carries the rendered script body + first observed remote IP + the lifecycle timestamps), and bench_progress_events (one row per phase boundary plus a finalize row with the full log). Both are retained for 30 days after the bench; expired tokens are cleaned up automatically.
Related
- Hipparchus — H.1.0 — the immediately preceding release; benchmark-score rehaul that introduced the per-sub-test scoring surface this release carries forward unchanged.
- /docs/scoring — the canonical scoring documentation, untouched by H.1.1 since the math did not change.