COMPARISON

RunLedger vs Golden Files

Golden files work for deterministic outputs. RunLedger is built for multi-step tool workflows.

comparison golden-files ci Updated 2026-01-23

Direct Answer

Recommendation Use RunLedger when tool calls and ordering matter; use golden files for stable pure outputs.

Golden files snapshot output artifacts. RunLedger replays tool calls and enforces contracts and budgets across steps.

Quick Decision

Use RunLedger when Use golden files when
You need deterministic tool replayYou only need output file diffs
You need CI gating for regressionsYou want simple golden snapshots

When golden files is better

  • Your output is a single deterministic artifact.
  • You want simple diffs with minimal setup.
  • There are no external tool calls.

When RunLedger wins

  • Agent behavior depends on tool sequences and schemas.
  • You need to catch regressions in tool ordering or budgets.
  • You want deterministic CI with reusable cassettes.

Tradeoffs

  • Requires additional config and runner setup.
  • Baseline promotion adds an extra step.
bash
runledger run ./evals/demo --mode record
runledger run ./evals/demo --mode replay --baseline baselines/demo.json

When NOT to use RunLedger

If your output is already deterministic and tool-free, golden files are simpler.

Related comparisons