Observation-only API

CAUM Product Docs

CAUM observes agent work structurally and detects when compute stops converting cleanly into progress, without reading private content.

Truth lock.

CAUM does not judge semantic truth, does not classify answer content, does not block agents, and does not claim universal future-outcome prediction. T1-T5 are structural health tiers. T4 is review-only. T5 and hard alerts are stronger structural evidence.

Product surfaces
Live Meter API
Pilot Meter API
CAUM Receipt upload
T1-T5 structural health
Privacy boundary
Claim audit rules

Product Surfaces

CAUM Receipt

Upload a trace file and receive a paid structural report with loop, cycle, token, cost, and evidence grading.

Live Meter

Send structural events while an agent runs. CAUM returns observe-only health and evidence fields.

Pilot Meter

Analyze grouped tasks from a pilot, compare task-level structural exposure, and estimate reviewable cost.

Research

Historical AUC/Cohen results remain research context, not the production sales claim.

Live Meter API

Base URL: https://caum-observation-production.up.railway.app

Method	Endpoint	Purpose
`GET`	`/v2/live/health`	Check deployed Live Meter status.
`POST`	`/v2/live/start`	Start an observe-only live session.
`POST`	`/v2/live/event`	Append one structural event and receive current health.
`POST`	`/v2/live/batch`	Analyze a batch of structural events.
`GET`	`/v2/live/session/{session_id}`	Read current durable session state with a valid session token.

Start a Session

For customer-bound pilots, add Authorization: Bearer caum_live_YOUR_KEY to start, event, batch, and session requests. Public demo mode works without a bearer token, but customer-bound sessions require both the same API key and the session token.

curl -X POST https://caum-observation-production.up.railway.app/v2/live/start \
  -H "Authorization: Bearer caum_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "acme",
    "task_id": "ticket-1842",
    "agent_id": "support-agent",
    "workflow": "customer-support",
    "baseline_cost_usd": 0.08
  }'

The response includes session.session_id and a one-time session_token. Store the token client-side for that running task; CAUM stores only a hash. Live persists sanitized structural events and chain state, not raw payloads.

Append Events

curl -X POST https://caum-observation-production.up.railway.app/v2/live/event \
  -H "Authorization: Bearer caum_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "SESSION_ID_FROM_START",
    "session_token": "SESSION_TOKEN_FROM_START",
    "event": {
      "event": "tool_call",
      "tool": "search",
      "phase": "request",
      "input_tokens": 320,
      "cost_usd": 0.002
    }
  }'

Minimal Python Client

import requests

API = "https://caum-observation-production.up.railway.app"

session = requests.post(f"{API}/v2/live/start", json={
    "tenant_id": "acme",
    "agent_id": "agent_1",
    "workflow": "coding_agent"
}, headers={"Authorization": "Bearer caum_live_YOUR_KEY"}).json()

sid = session["session"]["session_id"]
token = session["session_token"]

for step in agent_steps:
    result = requests.post(f"{API}/v2/live/event", json={
        "session_id": sid,
        "session_token": token,
        "event": {
            "event": step.kind,
            "tool": step.tool_name,
            "phase": step.phase,
            "input_tokens": step.input_tokens,
            "output_tokens": step.output_tokens,
            "cost_usd": step.cost_usd
        }
    }, headers={"Authorization": "Bearer caum_live_YOUR_KEY"}).json()
    print(result["tier"], result["evidence_grade"]["public_class"])

Important Response Fields

Field	Meaning	Public?
`tier`	T1-T5 structural health band.	Yes, with boundary text.
`structural_health`	Detailed structural review score and alert state.	Use carefully.
`evidence_grade.public_class`	Conservative public evidence class such as `hard_alert` or review-only class.	Yes.
`allowed_to_block`	Always false. CAUM is observation-only.	Yes.
`is_failure_claim`	Always false. CAUM does not claim semantic failure.	Yes.

Pilot Meter API

Pilot Meter accepts grouped task events or flat events. It is useful when a team wants first-pass structural evidence before buying a deeper integration. CAUM can start from common agent-stack baselines and then calibrate to the customer's workflow as more traces arrive.

curl -X POST https://caum-observation-production.up.railway.app/v2/pilot-meter \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "acme",
    "pilot_id": "week_1",
    "baseline_cost_usd_per_task": 0.08,
    "tasks": [{
      "task_id": "task_1",
      "agent_id": "agent_a",
      "steps": [
        {"event":"tool_call","tool":"search","cost_usd":0.002},
        {"event":"tool_result","tool":"search","status":"completed","cost_usd":0.002}
      ]
    }]
  }'

The response includes portfolio totals, task-level structural health, cost fields, profile detection, and evidence_grade. Cost fields are review evidence, not a claim that every reviewed dollar was wasted.

PDF Report Upload

The self-service paid path lives at /upload/. Use it when the customer has a trace export but no live integration yet.

Endpoint	Purpose
`POST /api/preflight`	Parse a trace, estimate steps, validate format, and return current report price.
`POST /api/checkout`	Create Stripe checkout for the report.
`GET /api/job/{job_id}`	Poll report generation progress.
`POST /api/recover`	Recover report access after payment.

Starter self-service receipts currently begin at $99 for small traces. Upload accepts JSON, JSON arrays, JSONL rows, grouped tasks, sessions, or event lists when they can be normalized safely.

T1-T5 Structural Health

Tier	Public framing	Meaning
T1	Healthy	Diverse, structurally clean work pattern.
T2	Healthy/monitor	Mostly clean structural progress.
T3	Monitor	Some structural friction or weak movement.
T4	Review-only	Broad review candidate. Do not market as confirmed dollar loss.
T5	Critical structural evidence	Strong loop/stagnation evidence or hard structural alert.

Privacy Boundary

CAUM should receive structural metadata, not prompts or private payloads. The Live endpoint sanitizes sensitive fields and persists only zero-semantic structural events, but integrations should avoid sending private fields in the first place.

Send: event kind, tool name, phase, status, token counts, cost, latency, task/session identifiers.
Do not send: prompt text, completions, customer data, files, diffs, commands, secrets, API keys, or raw tool arguments.
CAUM may hash identifiers and structural labels to keep reports reviewable without exposing private content.

Claim Audit Rules

Before publishing a number, apply this audit:

State exactly what the number claims.
State what it does not claim.
Separate internal triage metrics from public metrics.
Publish only metrics backed by direct evidence.
Describe false positive risk.
Remove or lower any phrase that can be misread as content-judgment, future-outcome prediction, or confirmed dollar loss.

Current public evidence floor.

Use hard alerts, critical T5 evidence, strong exact-cycle coverage, production replay errors, and direct exposed cost. Internal broad-review bucket percentages are audit triage only and must not be turned into public loss-rate language.