ANSWER HUB

Deterministic agent ci

Record once, replay in CI to eliminate flaky tool calls and enforce contracts.

ci deterministic record-replay Updated 2026-01-26

Direct Answer

Deterministic CI with RunLedger means recording tool calls once and replaying them in CI with assertions, budgets, and baseline gates.

Quick Decision

Use RunLedger when Consider alternatives when
External tools make CI flaky. Your tests are already deterministic.
You want hard pass/fail gates. You only want qualitative reviews.
You can maintain cassettes. You need live data every run.

3-step recipe

bash
runledger run ./evals/demo --mode record
        runledger baseline promote --from runledger_out/demo/RUN_ID --to baselines/demo.json
        runledger run ./evals/demo --mode replay --baseline baselines/demo.json

Signals you get

  • Cassette mismatch when tool calls change.
  • Assertion failures on output contracts.
  • Budget failures on latency or tool usage.
  • Baseline regressions on success rate or latency.

Tradeoffs

  • Requires cassette maintenance over time.
  • Live behavior may drift from recorded fixtures.
  • Initial setup adds overhead compared to unit tests.

When NOT to use RunLedger

Avoid deterministic replay if you must hit live tools on every CI run.

Next steps