COMPARISON
RunLedger vs Mocks
Mocks are precise for unit tests. RunLedger captures real tool behavior once and replays it in CI.
Direct Answer
Recommendation
Use RunLedger when mock maintenance becomes a bottleneck or misses real tool behavior.
Mocks are excellent for unit-level isolation. RunLedger is better for end-to-end agent flows where tool behavior and ordering matter.
Quick Decision
| Use RunLedger when | Use mocks when |
|---|---|
| You want full flow determinism | You want isolated unit behavior |
| You need recorded real tool shapes | You can maintain small mock fixtures |
When mocks is better
- You only need to test a single function or module.
- You want complete control over every edge case.
- You do not want to run a runner or suite config.
When RunLedger wins
- Mocks drift from real tool responses.
- You need to ensure tool ordering and schema adherence.
- You want to gate PRs on full workflow regressions.
Tradeoffs
- Cassettes can get stale and need re-recording.
- Less surgical than unit tests for tiny components.
bash
runledger run ./evals/demo --mode record
runledger run ./evals/demo --mode replay --baseline baselines/demo.json
When NOT to use RunLedger
If you only need isolated unit tests with tiny fixtures, mocks are simpler.