From 607978b81e825e486248491c7ac9e10a13bfc1b4 Mon Sep 17 00:00:00 2001 From: mikemolinet Date: Mon, 11 May 2026 18:04:01 -0700 Subject: [PATCH] docs: body-verify guide for self-hosters (cueapi/cueapi #801 parity) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Private cueapi #801 added docs/guides/body-verify.mdx (Mintlify, 247 lines) documenting the X-CueAPI-Verify-Echo primitive for hosted users. cueapi-core ships the SAME substrate (Layer 1 from #86 + Layer 1.5 universal middleware from #87 + STRING-shape spec-lock from #88) but had no user-facing guide. Self-hosters running cueapi-core need this. ## Scope cueapi-core-flavored adaptation — plain MD (not MDX), substrate-focused (narrower than the hosted MDX guide which covers SDK install + hosted- specific patterns). Sections: 1. The bug class — caller-side shell expansion ($(...), `...`, ${VAR}) 2. How it works — X-CueAPI-Verify-Echo header + response shape 3. Substrate architecture — Layer 1 (endpoint-specific STRING shape on /v1/messages + /v1/cues/{id}/fire) vs Layer 1.5 (universal middleware on all other POST/PATCH/PUT endpoints, OBJECT shape) 4. SDK auto-verify summary table (python / cli / mcp / action), with the messages-default-on vs fire-opt-in distinction explicit 5. Defense-in-depth layers — substrate + SDK + force-file + docs 6. When to disable + opt-out semantics 7. Implementation file references (app/utils/verify_echo.py, app/middleware/verify_echo.py, app/routers/{messages,cues}.py) 8. Background — Mike directive 2026-05-11; cross-stack ship history ## Out of scope - Per-language SDK install / API-key bootstrap content — already in each SDK's own README. This guide links rather than duplicates. - The hosted-specific Mintlify mint.json registration — cueapi-core uses plain MD; doc discovery via the docs/ dir, not Mintlify. ## Build / lint - Plain Markdown; no build step required. (Other docs in this dir follow the same convention — see quickstart.md, configuration.md.) --- README.md | 1 + docs/body-verify.md | 113 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 114 insertions(+) create mode 100644 docs/body-verify.md diff --git a/README.md b/README.md index ffdeb78..68db985 100644 --- a/README.md +++ b/README.md @@ -381,6 +381,7 @@ CueAPI does one thing: schedules tasks and tells you whether they succeeded. | [Configuration](docs/configuration.md) | Every environment variable documented | | [Workers](docs/workers.md) | Worker transport setup and scaling | | [Production](docs/production.md) | PostgreSQL, Redis, reverse proxy, monitoring | +| [Body-integrity verification](docs/body-verify.md) | Opt-in echo-back primitive that detects caller-side shell expansion + silent body corruption | | [FAQ](docs/faq.md) | Common self-hosting questions | **Stack:** FastAPI, PostgreSQL 16, Redis 7, Docker diff --git a/docs/body-verify.md b/docs/body-verify.md new file mode 100644 index 0000000..25e97af --- /dev/null +++ b/docs/body-verify.md @@ -0,0 +1,113 @@ +# Body integrity verification + +cueapi-core ships an opt-in body-integrity primitive that protects against silent body-content corruption — most commonly caller-side shell expansion of metacharacters (`$(...)`, backticks, `${VAR}`) in body arguments BEFORE the request leaves the caller's environment. + +This is the open-core substrate behind the SDK-level auto-verify in [cueapi-python](https://github.com/cueapi/cueapi-python) and [cueapi-cli](https://github.com/cueapi/cueapi-cli). Self-hosters running cueapi-core get the substrate for free; their SDKs talk to it the same way the hosted SDKs do. + +## The bug class this prevents + +```bash +BODY="message with $(date) timestamp" +curl -X POST https://your-cueapi-instance/v1/messages \ + -H "Authorization: Bearer $CUEAPI_API_KEY" \ + -H "X-Cueapi-From-Agent: $SENDER" \ + -H "Content-Type: application/json" \ + -d "{\"to\":\"$RECIPIENT\",\"body\":\"$BODY\"}" +``` + +Bash evaluates `$(date)` at variable-assignment time. The body you THOUGHT you were sending is gone before curl runs. The substrate accepts the mutated JSON with HTTP 200. The recipient sees corrupted content. No fail-loud signal anywhere in the pipeline. + +LLM-generated curl examples, CI scripts, and bash automation reproduce the pattern at scale. The verify-echo primitive surfaces it loudly. + +## How it works + +Add `X-CueAPI-Verify-Echo: true` to any POST/PATCH/PUT request with a body. The substrate echoes the body it received back in the response under `body_received` (plus a SHA-256 hex digest under `body_received_sha256` for constant-cost compare): + +```bash +curl -X POST https://your-cueapi-instance/v1/messages \ + -H "Authorization: Bearer $CUEAPI_API_KEY" \ + -H "X-Cueapi-From-Agent: $SENDER" \ + -H "X-CueAPI-Verify-Echo: true" \ + -H "Content-Type: application/json" \ + -d @/tmp/body.json +``` + +Response (truncated): + +```json +{ + "id": "msg_xxx", + "delivered": true, + "body_received": "the verbatim body the server received", + "body_received_sha256": "<64-hex sha256 of body field bytes>" +} +``` + +Caller computes `sha256(body_bytes)` locally; compares to `body_received_sha256`; throws on drift. + +The header is opt-in. Clients without it see no behavior change — `body_received` and `body_received_sha256` are absent from the response. + +## Substrate architecture + +Two layers ship in cueapi-core: + +**Layer 1** — Endpoint-specific echo (Phase 1): `POST /v1/messages` and `POST /v1/cues/{cue_id}/fire` extract their canonical string body field (`MessageCreate.body`, `payload_override.message` if present on fire) and echo it as a STRING. SDKs verify against the string directly. + +**Layer 1.5** — Universal middleware (Phase 2): all other POST/PATCH/PUT endpoints get echo coverage via `app/middleware/verify_echo.py`. The middleware echoes the parsed request body as a JSON object. 52 endpoints covered; method-gated (GET ignored); status-gated (4xx/5xx not echoed; validation errors stay clean). + +| Endpoint shape | `body_received` shape | +|---|---| +| `POST /v1/messages` | STRING — `MessageCreate.body` verbatim | +| `POST /v1/cues/{id}/fire` | STRING — `payload_override.message` (or null when no body) | +| All other POST/PATCH/PUT | OBJECT — parsed request body as JSON | + +`body_received_sha256` is always a 64-character hex string of those exact UTF-8 bytes. + +## SDK auto-verify + +The official SDKs use `X-CueAPI-Verify-Echo: true` automatically and raise on mismatch: + +| SDK | Messages | Cues fire | Opt-out | +|---|---|---|---| +| **cueapi-python** | default-on | opt-in (`auto_verify=True`) | `auto_verify=False` | +| **cueapi-cli** | default-on | opt-in (`--verify`) | `--no-verify` | +| **cueapi-mcp** | default-on | opt-in (`auto_verify: true`) | omit flag | +| **cueapi-action** | default-on (`no-verify: "true"` opts out) | opt-in (`verify: "true"`) | — | + +Why fire is opt-in everywhere: the substrate's `/v1/cues/{cue_id}/fire` echoes a pydantic-after-parse body that may include server-side default-population, causing spurious diff vs the SDK's canonical-JSON serialization. The default-off design avoids false-positive mismatches; callers opt in when they know substrate echo matches their serialization (typical for the sha256 constant-cost path). + +## Defense-in-depth layers (recap) + +For full coverage, layer this substrate with the SDK + caller patterns: + +1. **Substrate** (this guide) — `X-CueAPI-Verify-Echo` echo-back. Open-core; ships in cueapi-core. +2. **SDK auto-verify** — clients use the header automatically and raise on mismatch. +3. **Force-file mode** — `cueapi-cli messages send --message-file ` reads bytes verbatim; rejects inline `--body` with shell metacharacters unless `--allow-inline-metachars` is set. +4. **Documentation** — guides lead with file-payload pattern, not inline strings. + +Self-hosters running cueapi-core get layers 1 + 2 automatically by using the published SDKs. Layer 3 ships in cueapi-cli; layer 4 is this guide. + +## When to disable + +Disable verify-echo only when: + +- **Perf-sensitive flows** at the very high QPS limit. The echo doubles response-payload bytes; a sustained outbound stream may want to opt out for the 5-10% saving. +- **Streaming use cases** where the response body shape is locked downstream and adding fields breaks compatibility. + +Opt-out is per-request (SDK kwarg or CLI flag); there's no global server-side disable. + +## Implementation references + +- `app/utils/verify_echo.py` — STRING-shape helper (Layer 1) +- `app/middleware/verify_echo.py` — universal middleware (Layer 1.5) +- `app/routers/messages.py`, `app/routers/cues.py` — Phase 1 endpoint integration + +Tests pin the shape: + +- `tests/test_verify_echo.py` — Layer 1 endpoint coverage +- `tests/test_verify_echo_middleware.py` — Layer 1.5 method/status/content-type gating + idempotency-when-handler-already-injected +- `tests/test_verify_endpoints.py` — verify-result endpoints + +## Background + +Added 2026-05-11 (Mike body-verify directive). Substrate primitive Layer 1 from cueapi-core #86; Layer 1.5 universal middleware from #87; STRING-shape spec-lock from #88 (echo-shape hotfix for `body_received` field on messages endpoint). Cross-stack SDK Phase 2 + caller force-file + this docs guide constitute the four-layer defense.