Observation-only API

CAUM Product Docs

CAUM observes agent work structurally and detects when compute stops converting cleanly into progress, without reading private content.

Truth lock.

CAUM does not judge semantic truth, does not classify answer content, does not block agents, and does not claim universal future-outcome prediction. T1-T5 are structural health tiers. T4 is review-only. T5 and hard alerts are stronger structural evidence.

Product Surfaces

CAUM Receipt

Upload a trace file and receive a paid structural report with loop, cycle, token, cost, and evidence grading.

Live Meter

Send structural events while an agent runs. CAUM returns observe-only health and evidence fields.

Pilot Meter

Analyze grouped tasks from a pilot, compare task-level structural exposure, and estimate reviewable cost.

Research

Historical AUC/Cohen results remain research context, not the production sales claim.

Live Meter API

Base URL: https://caum-observation-production.up.railway.app

MethodEndpointPurpose
GET/v2/live/healthCheck deployed Live Meter status.
POST/v2/live/startStart an observe-only live session.
POST/v2/live/eventAppend one structural event and receive current health.
POST/v2/live/batchAnalyze a batch of structural events.
GET/v2/live/session/{session_id}Read current durable session state with a valid session token.

Start a Session

For customer-bound pilots, add Authorization: Bearer caum_live_YOUR_KEY to start, event, batch, and session requests. Public demo mode works without a bearer token, but customer-bound sessions require both the same API key and the session token.

curl -X POST https://caum-observation-production.up.railway.app/v2/live/start \
  -H "Authorization: Bearer caum_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "acme",
    "task_id": "ticket-1842",
    "agent_id": "support-agent",
    "workflow": "customer-support",
    "baseline_cost_usd": 0.08
  }'

The response includes session.session_id and a one-time session_token. Store the token client-side for that running task; CAUM stores only a hash. Live persists sanitized structural events and chain state, not raw payloads.

Append Events

curl -X POST https://caum-observation-production.up.railway.app/v2/live/event \
  -H "Authorization: Bearer caum_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "SESSION_ID_FROM_START",
    "session_token": "SESSION_TOKEN_FROM_START",
    "event": {
      "event": "tool_call",
      "tool": "search",
      "phase": "request",
      "input_tokens": 320,
      "cost_usd": 0.002
    }
  }'

Minimal Python Client

import requests

API = "https://caum-observation-production.up.railway.app"

session = requests.post(f"{API}/v2/live/start", json={
    "tenant_id": "acme",
    "agent_id": "agent_1",
    "workflow": "coding_agent"
}, headers={"Authorization": "Bearer caum_live_YOUR_KEY"}).json()

sid = session["session"]["session_id"]
token = session["session_token"]

for step in agent_steps:
    result = requests.post(f"{API}/v2/live/event", json={
        "session_id": sid,
        "session_token": token,
        "event": {
            "event": step.kind,
            "tool": step.tool_name,
            "phase": step.phase,
            "input_tokens": step.input_tokens,
            "output_tokens": step.output_tokens,
            "cost_usd": step.cost_usd
        }
    }, headers={"Authorization": "Bearer caum_live_YOUR_KEY"}).json()
    print(result["tier"], result["evidence_grade"]["public_class"])

Important Response Fields

FieldMeaningPublic?
tierT1-T5 structural health band.Yes, with boundary text.
structural_healthDetailed structural review score and alert state.Use carefully.
evidence_grade.public_classConservative public evidence class such as hard_alert or review-only class.Yes.
allowed_to_blockAlways false. CAUM is observation-only.Yes.
is_failure_claimAlways false. CAUM does not claim semantic failure.Yes.

Pilot Meter API

Pilot Meter accepts grouped task events or flat events. It is useful when a team wants first-pass structural evidence before buying a deeper integration. CAUM can start from common agent-stack baselines and then calibrate to the customer's workflow as more traces arrive.

curl -X POST https://caum-observation-production.up.railway.app/v2/pilot-meter \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "acme",
    "pilot_id": "week_1",
    "baseline_cost_usd_per_task": 0.08,
    "tasks": [{
      "task_id": "task_1",
      "agent_id": "agent_a",
      "steps": [
        {"event":"tool_call","tool":"search","cost_usd":0.002},
        {"event":"tool_result","tool":"search","status":"completed","cost_usd":0.002}
      ]
    }]
  }'

The response includes portfolio totals, task-level structural health, cost fields, profile detection, and evidence_grade. Cost fields are review evidence, not a claim that every reviewed dollar was wasted.

PDF Report Upload

The self-service paid path lives at /upload/. Use it when the customer has a trace export but no live integration yet.

EndpointPurpose
POST /api/preflightParse a trace, estimate steps, validate format, and return current report price.
POST /api/checkoutCreate Stripe checkout for the report.
GET /api/job/{job_id}Poll report generation progress.
POST /api/recoverRecover report access after payment.

Starter self-service receipts currently begin at $99 for small traces. Upload accepts JSON, JSON arrays, JSONL rows, grouped tasks, sessions, or event lists when they can be normalized safely.

T1-T5 Structural Health

TierPublic framingMeaning
T1HealthyDiverse, structurally clean work pattern.
T2Healthy/monitorMostly clean structural progress.
T3MonitorSome structural friction or weak movement.
T4Review-onlyBroad review candidate. Do not market as confirmed dollar loss.
T5Critical structural evidenceStrong loop/stagnation evidence or hard structural alert.

Privacy Boundary

CAUM should receive structural metadata, not prompts or private payloads. The Live endpoint sanitizes sensitive fields and persists only zero-semantic structural events, but integrations should avoid sending private fields in the first place.

Claim Audit Rules

Before publishing a number, apply this audit:

  1. State exactly what the number claims.
  2. State what it does not claim.
  3. Separate internal triage metrics from public metrics.
  4. Publish only metrics backed by direct evidence.
  5. Describe false positive risk.
  6. Remove or lower any phrase that can be misread as content-judgment, future-outcome prediction, or confirmed dollar loss.
Current public evidence floor.

Use hard alerts, critical T5 evidence, strong exact-cycle coverage, production replay errors, and direct exposed cost. Internal broad-review bucket percentages are audit triage only and must not be turned into public loss-rate language.