COMPARISON

RunLedger vs Integration Tests

Integration tests validate live systems. RunLedger provides deterministic CI gates for tool-using agents.

comparison integration-tests ci Updated 2026-01-23

Direct Answer

Recommendation Use RunLedger for deterministic CI gates, and reserve integration tests for staging or periodic checks.

Integration tests are valuable for real systems, but they can be flaky. RunLedger replays tool calls for fast, deterministic CI.

Quick Decision

Use RunLedger when Use integration tests when
You need fast deterministic CIYou need live system validation
You want replayed tool callsYou require real external responses every run

When integration tests is better

  • You need to validate live third-party behavior.
  • You are exercising infrastructure changes.
  • You can tolerate slower, flaky runs.

When RunLedger wins

  • You want stable, fast CI for every PR.
  • You need repeatable tool outputs and clear regressions.
  • You want to gate merges on deterministic results.

Tradeoffs

  • Replayed data can miss recent external changes.
  • You may still need periodic live integration checks.
bash
runledger run ./evals/demo --mode record
runledger run ./evals/demo --mode replay --baseline baselines/demo.json

When NOT to use RunLedger

If you require live external behavior on every CI run, use integration tests instead.

Related comparisons