ABOUT

Deterministic CI for agents.

RunLedger exists to make agent behavior repeatable, testable, and safe to ship. We believe agent systems deserve the same rigor as any other software.

Mission

Give teams a deterministic harness for tool-using agents: replayable tools, hard assertions, and budgets that stop regressions before they ship.

Principles

Determinism over vibes

Record tool outputs once and replay them to keep CI stable.

Hard assertions

Agent output is validated by schemas, not intuition.

Budgets as guardrails

Latency, tool calls, and cost are enforced as merge gates.

Artifacts are contracts

Stable outputs make regressions easy to diff and review.

Where we are going

Next milestones focus on record mode, richer HTML reports, and baseline workflows for teams.

See the roadmap

Why it matters

Agent regressions are easy to miss and expensive to debug. Deterministic evals make failures actionable and auditable.