COMPARISON

RunLedger vs Mocks

Mocks are precise for unit tests. RunLedger captures real tool behavior once and replays it in CI.

comparison mocks testing Updated 2026-01-23

Direct Answer

Recommendation Use RunLedger when mock maintenance becomes a bottleneck or misses real tool behavior.

Mocks are excellent for unit-level isolation. RunLedger is better for end-to-end agent flows where tool behavior and ordering matter.

Quick Decision

Use RunLedger when Use mocks when
You want full flow determinismYou want isolated unit behavior
You need recorded real tool shapesYou can maintain small mock fixtures

When mocks is better

  • You only need to test a single function or module.
  • You want complete control over every edge case.
  • You do not want to run a runner or suite config.

When RunLedger wins

  • Mocks drift from real tool responses.
  • You need to ensure tool ordering and schema adherence.
  • You want to gate PRs on full workflow regressions.

Tradeoffs

  • Cassettes can get stale and need re-recording.
  • Less surgical than unit tests for tiny components.
bash
runledger run ./evals/demo --mode record
runledger run ./evals/demo --mode replay --baseline baselines/demo.json

When NOT to use RunLedger

If you only need isolated unit tests with tiny fixtures, mocks are simpler.

Related comparisons