ANSWER HUB
RunLedger regression gates
Baselines capture known-good behavior so regressions become hard CI failures.
Direct Answer
RunLedger compares replay runs to a baseline summary and fails CI when success rate, cost, or latency regress.
Quick Decision
| Use RunLedger when | Consider alternatives when |
|---|---|
| You want automated regression gates. | You only need manual inspection. |
| You can maintain baselines. | You cannot define stable expectations. |
| You need PR blocking failures. | You only want soft metrics. |
Diff command
bash
runledger diff --baseline baselines/<suite>.json --run runledger_out/<suite>/<run_id>
Typical regression signals
- Success rate drops below threshold.
- Latency p95 exceeds allowed delta.
- Cost or token usage spikes.
Tradeoffs
- Baselines require intentional promotion.
- Thresholds need tuning to avoid noise.
- Large changes can trigger expected failures.
When NOT to use RunLedger
Skip baseline gates if outputs are too exploratory or unstable to baseline.