CAUM Product Docs
CAUM observes agent work structurally and detects when compute stops converting cleanly into progress, without reading private content.
CAUM does not judge semantic truth, does not classify answer content, does not block agents, and does not claim universal future-outcome prediction. T1-T5 are structural health tiers. T4 is review-only. T5 and hard alerts are stronger structural evidence.
Contents
Product Surfaces
CAUM Receipt
Upload a trace file and receive a paid structural report with loop, cycle, token, cost, and evidence grading.
Live Meter
Send structural events while an agent runs. CAUM returns observe-only health and evidence fields.
Pilot Meter
Analyze grouped tasks from a pilot, compare task-level structural exposure, and estimate reviewable cost.
Research
Historical AUC/Cohen results remain research context, not the production sales claim.
Live Meter API
Base URL: https://caum-observation-production.up.railway.app
| Method | Endpoint | Purpose |
|---|---|---|
GET | /v2/live/health | Check deployed Live Meter status. |
POST | /v2/live/start | Start an observe-only live session. |
POST | /v2/live/event | Append one structural event and receive current health. |
POST | /v2/live/batch | Analyze a batch of structural events. |
GET | /v2/live/session/{session_id} | Read current durable session state with a valid session token. |
Start a Session
For customer-bound pilots, add Authorization: Bearer caum_live_YOUR_KEY to start, event, batch, and session requests. Public demo mode works without a bearer token, but customer-bound sessions require both the same API key and the session token.
curl -X POST https://caum-observation-production.up.railway.app/v2/live/start \
-H "Authorization: Bearer caum_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "acme",
"task_id": "ticket-1842",
"agent_id": "support-agent",
"workflow": "customer-support",
"baseline_cost_usd": 0.08
}'
The response includes session.session_id and a one-time session_token. Store the token client-side for that running task; CAUM stores only a hash. Live persists sanitized structural events and chain state, not raw payloads.
Append Events
curl -X POST https://caum-observation-production.up.railway.app/v2/live/event \
-H "Authorization: Bearer caum_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"session_id": "SESSION_ID_FROM_START",
"session_token": "SESSION_TOKEN_FROM_START",
"event": {
"event": "tool_call",
"tool": "search",
"phase": "request",
"input_tokens": 320,
"cost_usd": 0.002
}
}'
Minimal Python Client
import requests
API = "https://caum-observation-production.up.railway.app"
session = requests.post(f"{API}/v2/live/start", json={
"tenant_id": "acme",
"agent_id": "agent_1",
"workflow": "coding_agent"
}, headers={"Authorization": "Bearer caum_live_YOUR_KEY"}).json()
sid = session["session"]["session_id"]
token = session["session_token"]
for step in agent_steps:
result = requests.post(f"{API}/v2/live/event", json={
"session_id": sid,
"session_token": token,
"event": {
"event": step.kind,
"tool": step.tool_name,
"phase": step.phase,
"input_tokens": step.input_tokens,
"output_tokens": step.output_tokens,
"cost_usd": step.cost_usd
}
}, headers={"Authorization": "Bearer caum_live_YOUR_KEY"}).json()
print(result["tier"], result["evidence_grade"]["public_class"])
Important Response Fields
| Field | Meaning | Public? |
|---|---|---|
tier | T1-T5 structural health band. | Yes, with boundary text. |
structural_health | Detailed structural review score and alert state. | Use carefully. |
evidence_grade.public_class | Conservative public evidence class such as hard_alert or review-only class. | Yes. |
allowed_to_block | Always false. CAUM is observation-only. | Yes. |
is_failure_claim | Always false. CAUM does not claim semantic failure. | Yes. |
Pilot Meter API
Pilot Meter accepts grouped task events or flat events. It is useful when a team wants first-pass structural evidence before buying a deeper integration. CAUM can start from common agent-stack baselines and then calibrate to the customer's workflow as more traces arrive.
curl -X POST https://caum-observation-production.up.railway.app/v2/pilot-meter \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "acme",
"pilot_id": "week_1",
"baseline_cost_usd_per_task": 0.08,
"tasks": [{
"task_id": "task_1",
"agent_id": "agent_a",
"steps": [
{"event":"tool_call","tool":"search","cost_usd":0.002},
{"event":"tool_result","tool":"search","status":"completed","cost_usd":0.002}
]
}]
}'
The response includes portfolio totals, task-level structural health, cost fields, profile detection, and evidence_grade. Cost fields are review evidence, not a claim that every reviewed dollar was wasted.
PDF Report Upload
The self-service paid path lives at /upload/. Use it when the customer has a trace export but no live integration yet.
| Endpoint | Purpose |
|---|---|
POST /api/preflight | Parse a trace, estimate steps, validate format, and return current report price. |
POST /api/checkout | Create Stripe checkout for the report. |
GET /api/job/{job_id} | Poll report generation progress. |
POST /api/recover | Recover report access after payment. |
Starter self-service receipts currently begin at $99 for small traces. Upload accepts JSON, JSON arrays, JSONL rows, grouped tasks, sessions, or event lists when they can be normalized safely.
T1-T5 Structural Health
| Tier | Public framing | Meaning |
|---|---|---|
| T1 | Healthy | Diverse, structurally clean work pattern. |
| T2 | Healthy/monitor | Mostly clean structural progress. |
| T3 | Monitor | Some structural friction or weak movement. |
| T4 | Review-only | Broad review candidate. Do not market as confirmed dollar loss. |
| T5 | Critical structural evidence | Strong loop/stagnation evidence or hard structural alert. |
Privacy Boundary
CAUM should receive structural metadata, not prompts or private payloads. The Live endpoint sanitizes sensitive fields and persists only zero-semantic structural events, but integrations should avoid sending private fields in the first place.
- Send: event kind, tool name, phase, status, token counts, cost, latency, task/session identifiers.
- Do not send: prompt text, completions, customer data, files, diffs, commands, secrets, API keys, or raw tool arguments.
- CAUM may hash identifiers and structural labels to keep reports reviewable without exposing private content.
Claim Audit Rules
Before publishing a number, apply this audit:
- State exactly what the number claims.
- State what it does not claim.
- Separate internal triage metrics from public metrics.
- Publish only metrics backed by direct evidence.
- Describe false positive risk.
- Remove or lower any phrase that can be misread as content-judgment, future-outcome prediction, or confirmed dollar loss.
Use hard alerts, critical T5 evidence, strong exact-cycle coverage, production replay errors, and direct exposed cost. Internal broad-review bucket percentages are audit triage only and must not be turned into public loss-rate language.