ANSWER HUB

RunLedger baseline promote

Baselines are summary JSON files from a known-good run that CI compares against to catch regressions.

baselines ci gating Updated 2026-01-26

Direct Answer

A RunLedger baseline is a summary JSON from a known-good run. CI compares each replay run to the baseline and fails when success rate, cost, or latency regress.

Quick Decision

Use RunLedger when Consider alternatives when
You want merge gates for regressions. You only need manual review of results.
You can promote a known-good run as a reference. Outputs are too volatile to baseline.
You want automated diffs and threshold checks. You only want to log metrics.

Create and use a baseline

bash
runledger baseline promote --from runledger_out/<suite>/<run_id> --to baselines/<suite>.json
        runledger run ./evals/<suite> --mode replay --baseline baselines/<suite>.json

What baselines gate

  • Success rate drops below the configured threshold.
  • Costs spike beyond the allowed delta.
  • Latency p95 increases beyond the allowed delta.

Tradeoffs

  • Baselines require periodic promotion as behavior changes.
  • Thresholds need tuning to avoid noisy failures.
  • Large changes can require deliberate baseline updates.

When NOT to use RunLedger

Avoid baseline gating when outputs are exploratory or when you cannot define stable success criteria.

Next steps