ABOUT
Deterministic CI for agents.
RunLedger exists to make agent behavior repeatable, testable, and safe to ship. We believe agent systems deserve the same rigor as any other software.
Mission
Give teams a deterministic harness for tool-using agents: replayable tools, hard assertions, and budgets that stop regressions before they ship.
Principles
Determinism over vibes
Record tool outputs once and replay them to keep CI stable.
Hard assertions
Agent output is validated by schemas, not intuition.
Budgets as guardrails
Latency, tool calls, and cost are enforced as merge gates.
Artifacts are contracts
Stable outputs make regressions easy to diff and review.
Where we are going
Next milestones focus on record mode, richer HTML reports, and baseline workflows for teams.
See the roadmapWhy it matters
Agent regressions are easy to miss and expensive to debug. Deterministic evals make failures actionable and auditable.