COMPARISON
RunLedger vs Snapshot Tests
Snapshot tests are great for deterministic outputs. RunLedger is built for tool-using agents that need record/replay in CI.
Direct Answer
Recommendation
Use RunLedger when agent behavior depends on tools or external APIs; use snapshots for static outputs.
Snapshot tests verify deterministic output strings. RunLedger captures tool call behavior and gates regressions across multi-step workflows.
Quick Decision
| Use RunLedger when | Use snapshot tests when |
|---|---|
| Tool calls make CI flaky | Outputs are stable and easy to diff |
| You need schema + budget gates | You only need output snapshots |
When snapshot tests is better
- Your output is purely deterministic and easy to snapshot.
- You want the lightest possible local test loop.
- You only need to diff rendered text or JSON outputs.
When RunLedger wins
- Tool calls, APIs, or databases introduce nondeterminism.
- You need to enforce tool order, schemas, and budgets.
- You want CI gating on regressions, not just diffs.
Tradeoffs
- Requires suites, cassettes, and baseline management.
- Adds upfront configuration compared to a single snapshot file.
bash
runledger run ./evals/demo --mode record
runledger run ./evals/demo --mode replay --baseline baselines/demo.json
When NOT to use RunLedger
If your agent never calls tools and output snapshots are sufficient, a snapshot test is simpler.