COMPARISON
RunLedger vs Golden Files
Golden files work for deterministic outputs. RunLedger is built for multi-step tool workflows.
Direct Answer
Recommendation
Use RunLedger when tool calls and ordering matter; use golden files for stable pure outputs.
Golden files snapshot output artifacts. RunLedger replays tool calls and enforces contracts and budgets across steps.
Quick Decision
| Use RunLedger when | Use golden files when |
|---|---|
| You need deterministic tool replay | You only need output file diffs |
| You need CI gating for regressions | You want simple golden snapshots |
When golden files is better
- Your output is a single deterministic artifact.
- You want simple diffs with minimal setup.
- There are no external tool calls.
When RunLedger wins
- Agent behavior depends on tool sequences and schemas.
- You need to catch regressions in tool ordering or budgets.
- You want deterministic CI with reusable cassettes.
Tradeoffs
- Requires additional config and runner setup.
- Baseline promotion adds an extra step.
bash
runledger run ./evals/demo --mode record
runledger run ./evals/demo --mode replay --baseline baselines/demo.json
When NOT to use RunLedger
If your output is already deterministic and tool-free, golden files are simpler.