ANSWER HUB
RunLedger replay mode
Replay mode reuses cassette entries so CI never calls live tools.
Direct Answer
Replay mode reuses recorded cassettes so CI never calls live tools. It applies assertions, budgets, and optional baseline gates to fail on regressions.
Quick Decision
| Use RunLedger when | Consider alternatives when |
|---|---|
| You need deterministic CI and fast runs. | You require live data every run. |
| You want replay to enforce contracts and budgets. | You only want soft monitoring. |
| You can maintain cassettes over time. | You cannot update fixtures reliably. |
Replay command
bash
runledger run ./evals/<suite> --mode replay --baseline baselines/<suite>.json
What can fail
- Cassette mismatch when tool calls change.
- Assertion failures on output schema or tool usage.
- Budget failures on wall time or tool limits.
- Baseline regressions on success rate or latency.
Tradeoffs
- Replay runs depend on cassette freshness.
- Live behavior may drift from recorded outputs.
- Changes require re-recording and review.
When NOT to use RunLedger
Skip replay when every run must use live data, or when you cannot store tool outputs.