COMPARISON

RunLedger vs Golden Files

Golden files work for deterministic outputs. RunLedger is built for multi-step tool workflows.

comparison golden-files ci Updated 2026-01-23

Direct Answer

Recommendation Use RunLedger when tool calls and ordering matter; use golden files for stable pure outputs.

Golden files snapshot output artifacts. RunLedger replays tool calls and enforces contracts and budgets across steps.

Use RunLedger when	Use golden files when
You need deterministic tool replay	You only need output file diffs
You need CI gating for regressions	You want simple golden snapshots

bash

runledger run ./evals/demo --mode record
runledger run ./evals/demo --mode replay --baseline baselines/demo.json

If your output is already deterministic and tool-free, golden files are simpler.

When to use RunLedger instead of snapshot tests for agent CI.

How RunLedger compares to hand-written mocks for tool-using agents.

Compare RunLedger with VCR.py-style HTTP recording for agent workflows.

Last updated: 2026-01-23