REFERENCE
CLI, config, and protocol reference.
Every command, every field, and every message type your agent needs to speak in production.
CLI commands
Quickstartrun
Execute a suite in replay or record mode and write artifacts.
runledger run ./evals --mode replay runledger run ./evals --mode record --case t1
diff
Compare a run directory against a baseline file.
runledger diff --baseline baselines/support.json --run runledger_out/demo/2025-01-01
baseline promote
Promote a successful run to a baseline file for regression gates.
runledger baseline promote --from runledger_out/demo/2025-01-01 --to baselines/demo.json
init
Create a demo agent, eval suite, and example cassette.
runledger init
suite.yaml
suite_name: support-triage
agent_command: ["python", "agent.py"]
mode: replay
cases_path: cases
tool_registry:
- search_docs
- create_issue
assertions:
- type: json_schema
schema_path: schema.json
budgets:
max_wall_ms: 20000
max_tool_calls: 10
baseline_path: baselines/support-triage.json
suite_name is the stable CI identifier.
agent_command launches your agent as a subprocess.
tool_registry restricts which tools can be invoked.
assertions and budgets apply to all cases by default.
baseline_path enables regression gating in CI.
cases/*.yaml
id: t1
description: "triage a login ticket"
input:
ticket: "User cannot login"
context:
plan: "pro"
cassette: cassettes/t1.jsonl
assertions:
- type: required_fields
fields: ["category", "reply"]
budgets:
max_wall_ms: 5000
input is forwarded verbatim to the agent.
cassette is required for replay mode.
assertions override or extend suite defaults.
budgets can be customized per case.
Protocol messages
stdin/stdout JSONLRunner to Agent
{ "type": "task_start", "task_id": "t1", "input": { "ticket": "..." } }
{ "type": "tool_result", "call_id": "c1", "ok": true, "result": { "hits": [] } }
Agent to Runner
{ "type": "tool_call", "name": "search_docs", "call_id": "c1", "args": { "q": "..." } }
{ "type": "final_output", "output": { "category": "billing", "reply": "..." } }