COMPARISON

RunLedger vs Snapshot Tests

Snapshot tests are great for deterministic outputs. RunLedger is built for tool-using agents that need record/replay in CI.

comparison snapshot-tests ci Updated 2026-01-23

Direct Answer

Recommendation Use RunLedger when agent behavior depends on tools or external APIs; use snapshots for static outputs.

Snapshot tests verify deterministic output strings. RunLedger captures tool call behavior and gates regressions across multi-step workflows.

Use RunLedger when	Use snapshot tests when
Tool calls make CI flaky	Outputs are stable and easy to diff
You need schema + budget gates	You only need output snapshots

bash

runledger run ./evals/demo --mode record
runledger run ./evals/demo --mode replay --baseline baselines/demo.json

If your agent never calls tools and output snapshots are sufficient, a snapshot test is simpler.

How RunLedger compares to hand-written mocks for tool-using agents.

Compare RunLedger with VCR.py-style HTTP recording for agent workflows.

When to use RunLedger instead of golden file tests for agent behavior.

Last updated: 2026-01-23