COMPARISON

RunLedger vs Snapshot Tests

Snapshot tests are great for deterministic outputs. RunLedger is built for tool-using agents that need record/replay in CI.

comparison snapshot-tests ci Updated 2026-01-23

Direct Answer

Recommendation Use RunLedger when agent behavior depends on tools or external APIs; use snapshots for static outputs.

Snapshot tests verify deterministic output strings. RunLedger captures tool call behavior and gates regressions across multi-step workflows.

Quick Decision

Use RunLedger when Use snapshot tests when
Tool calls make CI flakyOutputs are stable and easy to diff
You need schema + budget gatesYou only need output snapshots

When snapshot tests is better

  • Your output is purely deterministic and easy to snapshot.
  • You want the lightest possible local test loop.
  • You only need to diff rendered text or JSON outputs.

When RunLedger wins

  • Tool calls, APIs, or databases introduce nondeterminism.
  • You need to enforce tool order, schemas, and budgets.
  • You want CI gating on regressions, not just diffs.

Tradeoffs

  • Requires suites, cassettes, and baseline management.
  • Adds upfront configuration compared to a single snapshot file.
bash
runledger run ./evals/demo --mode record
runledger run ./evals/demo --mode replay --baseline baselines/demo.json

When NOT to use RunLedger

If your agent never calls tools and output snapshots are sufficient, a snapshot test is simpler.

Related comparisons