COMPARISON
RunLedger vs Integration Tests
Integration tests validate live systems. RunLedger provides deterministic CI gates for tool-using agents.
Direct Answer
Recommendation
Use RunLedger for deterministic CI gates, and reserve integration tests for staging or periodic checks.
Integration tests are valuable for real systems, but they can be flaky. RunLedger replays tool calls for fast, deterministic CI.
Quick Decision
| Use RunLedger when | Use integration tests when |
|---|---|
| You need fast deterministic CI | You need live system validation |
| You want replayed tool calls | You require real external responses every run |
When integration tests is better
- You need to validate live third-party behavior.
- You are exercising infrastructure changes.
- You can tolerate slower, flaky runs.
When RunLedger wins
- You want stable, fast CI for every PR.
- You need repeatable tool outputs and clear regressions.
- You want to gate merges on deterministic results.
Tradeoffs
- Replayed data can miss recent external changes.
- You may still need periodic live integration checks.
bash
runledger run ./evals/demo --mode record
runledger run ./evals/demo --mode replay --baseline baselines/demo.json
When NOT to use RunLedger
If you require live external behavior on every CI run, use integration tests instead.