REFERENCE

CLI, config, and protocol reference.

Every command, every field, and every message type your agent needs to speak in production.

CLI commands

Quickstart

run

Execute a suite in replay or record mode and write artifacts.

runledger run ./evals --mode replay
runledger run ./evals --mode record --case t1

diff

Compare a run directory against a baseline file.

runledger diff --baseline baselines/support.json --run runledger_out/demo/2025-01-01

baseline promote

Promote a successful run to a baseline file for regression gates.

runledger baseline promote --from runledger_out/demo/2025-01-01 --to baselines/demo.json

init

Create a demo agent, eval suite, and example cassette.

runledger init

suite.yaml

suite_name: support-triage
agent_command: ["python", "agent.py"]
mode: replay
cases_path: cases
tool_registry:
  - search_docs
  - create_issue
assertions:
  - type: json_schema
    schema_path: schema.json
budgets:
  max_wall_ms: 20000
  max_tool_calls: 10
baseline_path: baselines/support-triage.json

suite_name is the stable CI identifier.

agent_command launches your agent as a subprocess.

tool_registry restricts which tools can be invoked.

assertions and budgets apply to all cases by default.

baseline_path enables regression gating in CI.

cases/*.yaml

id: t1
description: "triage a login ticket"
input:
  ticket: "User cannot login"
  context:
    plan: "pro"
cassette: cassettes/t1.jsonl
assertions:
  - type: required_fields
    fields: ["category", "reply"]
budgets:
  max_wall_ms: 5000

input is forwarded verbatim to the agent.

cassette is required for replay mode.

assertions override or extend suite defaults.

budgets can be customized per case.

Protocol messages

stdin/stdout JSONL
Runner to Agent
{ "type": "task_start", "task_id": "t1", "input": { "ticket": "..." } }
{ "type": "tool_result", "call_id": "c1", "ok": true, "result": { "hits": [] } }
Agent to Runner
{ "type": "tool_call", "name": "search_docs", "call_id": "c1", "args": { "q": "..." } }
{ "type": "final_output", "output": { "category": "billing", "reply": "..." } }