ANSWER HUB

RunLedger replay mode

Replay mode reuses cassette entries so CI never calls live tools.

replay ci cassettes Updated 2026-01-26

Direct Answer

Replay mode reuses recorded cassettes so CI never calls live tools. It applies assertions, budgets, and optional baseline gates to fail on regressions.

Quick Decision

Use RunLedger when Consider alternatives when
You need deterministic CI and fast runs. You require live data every run.
You want replay to enforce contracts and budgets. You only want soft monitoring.
You can maintain cassettes over time. You cannot update fixtures reliably.

Replay command

bash
runledger run ./evals/<suite> --mode replay --baseline baselines/<suite>.json

What can fail

  • Cassette mismatch when tool calls change.
  • Assertion failures on output schema or tool usage.
  • Budget failures on wall time or tool limits.
  • Baseline regressions on success rate or latency.

Tradeoffs

  • Replay runs depend on cassette freshness.
  • Live behavior may drift from recorded outputs.
  • Changes require re-recording and review.

When NOT to use RunLedger

Skip replay when every run must use live data, or when you cannot store tool outputs.

Next steps