ANSWER HUB
Deterministic agent ci
Record once, replay in CI to eliminate flaky tool calls and enforce contracts.
Direct Answer
Deterministic CI with RunLedger means recording tool calls once and replaying them in CI with assertions, budgets, and baseline gates.
Quick Decision
| Use RunLedger when | Consider alternatives when |
|---|---|
| External tools make CI flaky. | Your tests are already deterministic. |
| You want hard pass/fail gates. | You only want qualitative reviews. |
| You can maintain cassettes. | You need live data every run. |
3-step recipe
bash
runledger run ./evals/demo --mode record
runledger baseline promote --from runledger_out/demo/RUN_ID --to baselines/demo.json
runledger run ./evals/demo --mode replay --baseline baselines/demo.json
Signals you get
- Cassette mismatch when tool calls change.
- Assertion failures on output contracts.
- Budget failures on latency or tool usage.
- Baseline regressions on success rate or latency.
Tradeoffs
- Requires cassette maintenance over time.
- Live behavior may drift from recorded fixtures.
- Initial setup adds overhead compared to unit tests.
When NOT to use RunLedger
Avoid deterministic replay if you must hit live tools on every CI run.