CI FOR TOOL-USING AGENTS

RunLedger
CI for tool-using agents.

Control the chaos of probabilistic software.

RunLedger acts as a flight recorder and gatekeeper for your agents. Deterministic evals, replayable tool calls, and schema enforcement in one CLI.

View Live Demo
INPUT STREAM INTERCEPTION LAYER PRODUCTION
AGENT
RunLedger
LLM API
regressions.diff
FAIL
BASELINE (main)
CURRENT (PR #124)
1 {
2 "tool": "stripe_refund",
3 "args": {
4 "id": "ch_1Mc...",
5 "reason": "fraud"
6 }
7 }
1 {
2 "tool": "stripe_refund",
3 "args": {
4 "id": "ch_1Mc...",
MISSING ARGUMENT
6 }
7 }
Schema Error

Field 'reason' is required in StripeRefundSchema.

Regression Blocking

Catch drift before it costs you.

LLMs are probabilistic, but your tool interfaces are strict. RunLedger enforces Pydantic schemas and compares execution traces against known baselines.

  • Trace comparison

    Diff the full execution trace (messages, tool calls, outputs).

  • Schema Enforcement

    Fail the build if an agent hallucinates a parameter.

The Developer Infrastructure for Agents

Built for teams shipping to production, not just prototyping.

Live Run
240ms
Replay
0ms
CACHED

Deterministic Replay

Network calls (Stripe, Postgres) are recorded once. In CI, we replay the recording. Tests are instant and deterministic.

@record_replay
[INFO] Agent initialized
[TOOL] Calling search_users...
[DEBUG] Payload size: 2kb
[OK] Tool output received
[INFO] Reasoning step 2...
[TOOL] Calling update_db...
[OK] Commit successful

Flight Recorder

We capture agent messages, tool calls, tool outputs, and structured logs to build a complete trace of execution.

runledger record
Latency Budget 420ms / 500ms
Token Budget FAIL

Cost & Latency Gates

Set budgets for token usage and latency. If a prompt change causes your agent to loop or ramble, the test fails.

budget: 500ms

Drop-in Middleware

RunLedger works with any agent framework. Wrap your agent as a subprocess runner—minimal changes.

  • LangChain / LangGraph
  • LlamaIndex
  • AutoGen / CrewAI
  • Raw Python / Node.js
Client
RL
PROXY
RunLedger
LLM API

Ship agents with confidence.