Skip to content

feat: add fire + list_claimable + claim + claim_next#23

Merged
govindkavaturi-art merged 1 commit into
mainfrom
feat/fire-and-claim-methods
May 4, 2026
Merged

feat: add fire + list_claimable + claim + claim_next#23
govindkavaturi-art merged 1 commit into
mainfrom
feat/fire-and-claim-methods

Conversation

@mikemolinet
Copy link
Copy Markdown
Collaborator

Same change as the (now-closed) fork PR #22. Re-opened from an upstream branch so CI workflows can access staging secrets and the test job can pass.

Summary

Brings cueapi-sdk toward parity with @cueapi/mcp 0.3.0 + 0.4.0 surface. Four new methods covering SEND (fire) and RECEIVE (claimable list, claim, claim-next). Heartbeat signature fix held for follow-up commit pending technical review.

What's added

CuesResource:

  • fire(cue_id, payload_override=None, merge_strategy=None) POST /v1/cues/{id}/fire

ExecutionsResource:

  • list_claimable(task=None, agent=None) GET /v1/executions/claimable?task=&agent= (server-side filter)
  • claim(execution_id, worker_id=...) POST /v1/executions/{id}/claim (atomic; 409 if already claimed)
  • claim_next(worker_id=..., task=None) POST /v1/executions/claim or list+claim chain when task provided

Notable

  • list_claimable is server-side query params, not client-side filter. Client-side filter hits the LIMIT 50 starvation bug fixed in the 2026-04-25 prod incident.
  • claim_next(task=...) fans out internally because the server's claim endpoint doesn't accept a task filter today.

Tests

42 unit tests pass (was 30; +12 net). Mirrors the existing MagicMock pattern in tests/test_executions_resource.py.

Version

0.1.3 -> 0.2.0. Aligned cueapi/init.py (had drifted to 0.1.2) with pyproject.toml at the same time.

Pending follow-up (NOT in this PR)

ExecutionsResource.heartbeat(execution_id) sends an empty body and does not include worker_id via the X-Worker-Id request header that the server reads from. Holding the signature change pending technical review of the deprecation cadence.

Copy link
Copy Markdown
Collaborator Author

@mikemolinet mikemolinet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CTO review (CC-cue-pm)

Reviewed the full diff. Substantively in good shape — server-side filtering is correctly chosen, the fire/claim/list_claimable surface mirrors @cueapi/mcp 0.3.0/0.4.0, and the heartbeat-fix-held-for-follow-up call is the right discipline given the X-Worker-Id signature change. Test coverage is solid (12 new tests, MagicMock pattern consistent with the existing executions test file).

Leaving formal approval to Govind / maintainer per the agent-self-approval rule (cross-agent / cross-author approval semantics are untested in this repo). Comments below are review notes, not blockers unless flagged.

What I checked + liked

  • Server-side task / agent query params on list_claimable — correct, avoids the LIMIT 50 starvation bug from the 2026-04-25 prod incident. Test test_list_claimable_passes_task_as_query_param pins this.
  • merge_strategy omitted-when-not-passed test (test_fire_omits_merge_strategy_when_not_passed) is exactly the right contract pin. Server's Pydantic default of \"merge\" should win when caller doesn't specify; client never silently overrides.
  • payload_override semantics — docstring correctly notes "persisted on the resulting execution row, never on the cue itself." Matches the server contract (relevant to the open #577 GET filter behavior on the hosted side).
  • claim_next fan-out branch — well-documented, handles empty list cleanly (no_executions_for_task shape).
  • CHANGELOG drift fix (__init__.py was 0.1.2, pyproject.toml was 0.1.3 — both now 0.2.0) — good cleanup.

Notes / suggestions (non-blocking unless called out)

  1. claim_next no-task branch shape divergence — worth a docstring line. When task=None, the SDK forwards whatever the server returns on POST /v1/executions/claim. When task=<x> and the listing is empty, the SDK invents {\"claimed\": False, \"reason\": \"no_executions_for_task\", \"task\": <x>}. Two slightly different shapes for "no claim available." Not wrong, but worth one line in the docstring noting that the SDK augments the empty-task case with reason for caller diagnostics, and that the server-direct path may return a different no-claim shape. Lets future callers write a single conditional cleanly.

  2. claim_next fan-out race + 409. Docstring says "caller retries" on 409. That's the right SDK posture (don't bury retries). But there's no test that simulates a 409 propagating through claim_next. Suggest one extra test where the inner _post raises (or returns a non-claimed shape if your error model is non-exception); pins the contract that the SDK doesn't swallow it. Nice-to-have, not a blocker.

  3. merge_strategy: Optional[str] — consider Literal[\"merge\", \"replace\"]. Cheap type-safety win; matches the server's enum surface. Won't break callers since str is a supertype.

  4. list_claimable returns dict (with \"executions\" key) rather than the list directly. Consistent with raw API shape, and claim_next uses .get(\"executions\") or [] defensively. Defensible. If future SDK passes ever introduce typed Execution models, this is the surface to revisit. Flagging for awareness, not asking for change.

  5. Heartbeat follow-up — agreed call. Holding the worker_id-via-X-Worker-Id signature change pending technical-direction review of deprecation cadence is correct. Worth opening a tracking issue against this repo with the deprecation plan (kw-only worker_id with a one-version DeprecationWarning?) so it doesn't fall off.

  6. CHANGELOG date says 2026-05-01, PR opened 2026-05-02. Trivial; bump on merge if the maintainer cares.

Parity context

This is the cueapi-python side of the 3-layer parity discipline (Layer 1 = hosted PR template at cueapi#615, Layer 2 = parity-manifest seeds at cueapi-python#24 / cueapi-cli#25, Layer 3 = Backlog rows on the project tracker). If this merges before #24, the manifest baseline at #24 will need a quick refresh to reflect that fire/list_claimable/claim/claim_next are now covered — easier to do as a follow-up commit on #24 than to block here.

Bottom line

Code looks good. Test coverage is appropriate. No blockers from CTO-review lens. Formal approval deferred to Govind/maintainer.

— CC-cue-pm

@mikemolinet
Copy link
Copy Markdown
Collaborator Author

Cross-agent review (cueapi-secondary, cueapi engineering)

Looks good to me. Detailed findings below; leaving formal approval to Govind/maintainer per the cross-agent self-approval pattern (validated by cue-mac-app on cueapi #592/#593).

The implementation is clean, tests are tight, and the design choice to fan out claim_next(task=…) client-side rather than push a server-side filter is the right v1 call given the existing POST /claim endpoint. Server endpoint confirmed alive at app/routers/executions.py:229 — won't 404 in production.


CI status — pre-existing flake, not a regression

The two test job failures are all pre-existing staging-secret access issues on tests/test_cues.py, not introduced by this PR:

6 failed, 43 passed, 7 errors
FAILED tests/test_cues.py::TestCueCreate::* — cueapi.exceptions.AuthenticationError: Invalid API key
ERROR  tests/test_cues.py::TestCueList::* — cueapi.exceptions.AuthenticationError: Invalid API key
…

None of those test files are in this diff. The 12 new mock-based unit tests in test_cues_resource.py and test_executions_resource.py (TestFire, TestListClaimable, TestClaim, TestClaimNext) all pass. The PR description already noted the staging-secret rebase intent; that part of the rebase didn't fully land and the staging-cred flake is still present, but it's the same flake mergeable: MERGEABLE and sdk-integration: SUCCESS runs around it.

Suggest tracking the staging-cred fix as a separate issue rather than blocking this PR on it.


Findings

1. claim_next empty-queue handling is asymmetric across branches — HIGH (behavioral)

# With task, empty queue: synthetic return value
return {"claimed": False, "reason": "no_executions_for_task", "task": task}

# Without task, empty queue: server returns 409 → SDK raises CueAPIError
return self._client._post("/v1/executions/claim", json={"worker_id": worker_id})

Server returns 409 {error: {code: "claim_failed", …}} on empty queue (confirmed at app/routers/executions.py:289-303). Caller code can't be polymorphic — they have to know which branch they took to either inspect result["claimed"] or wrap in try/except CueAPIError.

Suggested: in the no-task branch, catch CueAPIError with code == "claim_failed" and return {"claimed": False, "reason": "no_executions"} for symmetry. Keeps the surface area uniform: callers always check result["claimed"], never need to try/except for normal "queue is empty" flow.

2. Race window in claim_next(task=…) documented in PR but not in docstring or code — HIGH (doc)

PR description: "Tiny race window between list and claim is bounded by the atomic claim returning 409, in which case the caller retries."

That contract is not in the SDK:

  • The docstring says nothing about 409 retry semantics for the task branch
  • claim_next doesn't catch + retry internally
  • A caller reading just the SDK will not know they need to wrap in retry

Suggested: add to the docstring, e.g. "When task is set and another worker claims the listed execution between list and claim, raises CueAPIError(code='claim_failed') — callers should retry." Or, since this is the same shape as the server's empty-queue response, consider catching internally and re-listing once before bubbling up. Either is fine; documenting the current behavior is the minimum.

3. claim_next accepts task but not agent, despite list_claimable accepting both — MEDIUM (design)

list_claimable(task=None, agent=None) is symmetric. claim_next(worker_id=…, task=None) only forwards task. Single-purpose workers that filter by both will not have an idiomatic claim_next path.

Suggested: either add agent=None to claim_next and forward both filters in the fan-out branch, or drop agent from list_claimable until there's a use case for it. Asymmetric defaults across the SDK surface read like an oversight.

4. Heartbeat follow-up has correctness implication — track concretely — NOTE (not blocking)

PR description notes executions.heartbeat(execution_id) sends an empty body and doesn't include X-Worker-Id. Looking at server (app/routers/executions.py:332-338): if x_worker_id is provided AND execution.claimed_by_worker differs, the heartbeat returns 403. Without the header, the ownership enforcement is silently bypassed (the request still 200s).

Holding the signature change pending technical review is reasonable; just suggest opening a tracking issue with a concrete cadence (e.g. "ship in 0.3.0") so this doesn't sit indefinitely. Not blocking #23.

5. worker_id not validated client-side — LOW (polish)

claim(execution_id, worker_id="") and claim_next(worker_id="") would send {"worker_id": ""}. Server should reject via Pydantic, but a client-side if not worker_id: raise ValueError(...) would fail faster with a clearer error. Especially helpful since worker_id is keyword-only and a caller could plausibly pass "" thinking it's optional.

6. fire / list_claimable / claim / claim_next return raw dict, not Pydantic models — NIT (follow-up)

Existing methods return Cue (resources/cues.py create/get/update). New methods return dict. Defensible — the response shapes for fire/claim/claim_next are simple and may evolve while the surface is young — but the inconsistency is worth a follow-up issue tracking response-model coverage. Future SDK consumers will want type hints on result.execution_id etc.

7. CHANGELOG date is 2026-05-01 — NIT

PR was opened 2 days ago; today is 2026-05-04. If the convention is "merge date," this needs updating at merge. If "intended release date," fine — but worth picking one.


What works well

  • Server-side filter via query params on list_claimable is the right call — explicit comment in the test file pinning the rationale (LIMIT 50 starvation bug from the 2026-04-25 prod incident) is exactly the kind of context that prevents future regressions.
  • test_fire_omits_merge_strategy_when_not_passed is a tight contract pin — exactly the right kind of test to keep server-side defaults working as the SDK evolves.
  • Keyword-only args (*, worker_id) on claim and claim_next prevent positional-arg drift across versions. Good defensive choice.
  • The PR description's "Pending follow-up" callout on heartbeat is the right pattern — flagging known gaps in the description rather than letting them slip silently.

cueapi-secondary (CueAPI engineering)

Copy link
Copy Markdown
Member

@govindkavaturi-art govindkavaturi-art left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid foundation: fire + list_claimable + claim + claim_next. The CHANGELOG "Pending follow-up" callout for heartbeat() missing worker_id is exactly the kind of debt-acknowledgment that prevents silent drift later. Approve.

Brings the Python SDK toward parity with the @cueapi/mcp 0.3.0 + 0.4.0
surface. Adds four new methods covering the SEND side (fire) and the
RECEIVE side (claimable list, claim, claim-next). Heartbeat signature
fix is held for a follow-up commit pending technical review.

CuesResource:
- fire(cue_id, payload_override=None, merge_strategy=None)
  POST /v1/cues/{id}/fire. For ad-hoc one-shot triggers and for using
  cues as a messaging channel between agents (carry message, instruction,
  task, reply_cue_id in payload_override).

ExecutionsResource:
- list_claimable(task=None, agent=None)
  GET /v1/executions/claimable?task=&agent=
  Filters server-side via query params (NOT client-side). Required for
  single-purpose workers; client-side filter after fetch hits the LIMIT
  50 starvation bug fixed in the 2026-04-25 prod incident.

- claim(execution_id, worker_id=...)
  POST /v1/executions/{id}/claim
  Atomic; returns 409 if already claimed.

- claim_next(worker_id=..., task=None)
  POST /v1/executions/claim (no task) OR list+claim chain (with task).
  Server's claim endpoint does not accept a task filter today; with task
  the SDK fans out (list_claimable filtered, pick oldest, claim by ID).
  Tiny race window between list and claim is bounded by the atomic claim
  returning 409, in which case the caller retries.

Version: 0.1.3 -> 0.2.0. Aligned cueapi/__init__.py (had drifted to
0.1.2) with pyproject.toml at the same time.

Tests: 42 unit tests pass (was 30; +12 net). Mirrors the existing
MagicMock pattern in tests/test_executions_resource.py.

Pending follow-up:
- ExecutionsResource.heartbeat(execution_id) currently sends an empty
  body and does not include worker_id via the X-Worker-Id request
  header that the server reads from. Worker-id is what the server uses
  to enforce ownership on heartbeat (returns 403 on mismatch); without
  it the race-protection check is silently bypassed. A signature change
  to add worker_id is held pending technical review of the deprecation
  cadence (additive kwarg-only with default-warn-on-omit vs hard
  signature change in a major bump).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@govindkavaturi-art govindkavaturi-art force-pushed the feat/fire-and-claim-methods branch from b2c61c3 to caf33a4 Compare May 4, 2026 18:14
@govindkavaturi-art govindkavaturi-art merged commit ffecf24 into main May 4, 2026
mikemolinet added a commit that referenced this pull request May 9, 2026
…t recent ports (#36)

Manifest was 3 days stale; many endpoints listed as missing have
been ported since the last audit.

Moved from endpoints_missing → endpoints_covered (with PR refs):

  - POST /v1/cues/{id}/fire (PR #23; in-flight kwargs in #33)
  - POST /v1/executions/{id}/replay (PR #25)
  - GET /v1/executions/claimable (PR #23)
  - POST /v1/executions/{id}/claim (PR #23)
  - POST /v1/executions/claim (PR #23)
  - GET /v1/workers + DELETE /v1/workers/{id} (PR #26)
  - GET /v1/usage (PR #26)
  - POST /v1/agents + GET/PATCH/DELETE /v1/agents/{ref}
    + GET /v1/agents/{ref}/webhook-secret
    + GET /v1/agents/{ref}/inbox + /sent (PR #27)
  - POST /v1/messages + GET/read/ack (PR #28)

Added in-flight refs (open PRs):

  - GET /v1/agents/roster (in-flight PR #35; cueapi #630 parity)
  - GET /v1/agents/{ref}/presence (in-flight PR #35; cueapi #662 parity)
  - send_at + exit_criteria + idempotency_key kwargs on fire (PR #33)
  - send_at kwarg on messages.send (PR #34)

New endpoints_missing items (post-audit):

  - POST /v1/agents/{ref}/webhook-secret/regenerate (destructive; tracked)
  - DELETE /v1/messages bulk (cueapi #650; bounded by cueapi-cli upstream)
  - POST /v1/executions/{id}/live-claim (cueapi #664; handler-runtime, not SDK)

New "in_flight_ports_2026_05_07" section listing all 4 currently-open
SDK PRs with PR-overlap notes (PR #30/#33 lane-flagged with cueapi-main).

Bumped sdk_version_at_audit 0.1.3 → 0.2.x.

This refresh closes the Backlog row "Refresh cueapi-python parity-manifest.json"
filed earlier today (Self-flag 2026-05-07).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants