Skip to content

feat(ai): add Claude Agent SDK integration for LLM analytics#477

Open
andrewm4894 wants to merge 6 commits intomasterfrom
feat/claude-agent-sdk-integration
Open

feat(ai): add Claude Agent SDK integration for LLM analytics#477
andrewm4894 wants to merge 6 commits intomasterfrom
feat/claude-agent-sdk-integration

Conversation

@andrewm4894
Copy link
Copy Markdown
Member

Summary

  • Add posthog.ai.claude_agent_sdk module that wraps claude_agent_sdk.query() to automatically emit $ai_generation, $ai_span, and $ai_trace events
  • Two entry points: query() drop-in replacement and instrument() for configure-once reuse
  • Per-turn generation tracking via Anthropic StreamEvents with two-slot input tracking for correct tool result attribution
  • All instrumentation wrapped in try/except so PostHog errors never interrupt the underlying query
  • 16 unit tests, example scripts, sampo changeset (minor bump)

How it works

The Claude Agent SDK has no TracingProcessor interface like OpenAI Agents SDK. Instead, this integration wraps the async streaming iterator from query(), enables include_partial_messages=True to receive raw Anthropic StreamEvents, and reconstructs per-turn $ai_generation events from message_start / message_stop boundaries. Tool uses emit $ai_span events, and ResultMessage triggers a $ai_trace with aggregate cost/latency.

Test plan

  • 16 unit tests passing (uv run pytest posthog/test/ai/claude_agent_sdk/ -v)
  • Live tested against EU PostHog project — generations, spans, and traces visible with correct input/output, token counts, costs, and cache metrics
  • Multi-turn queries with tool calls (Read, Glob, Bash) produce correct event tree
  • CI passes

Add posthog.ai.claude_agent_sdk module that wraps claude_agent_sdk.query()
to automatically emit $ai_generation, $ai_span, and $ai_trace events.

- PostHogClaudeAgentProcessor with _GenerationTracker that reconstructs
  per-turn generation metrics from Anthropic StreamEvents
- Two entry points: query() drop-in replacement and instrument() for
  configure-once reuse
- Two-slot input tracking to correctly associate tool results with
  subsequent generations despite SDK message ordering
- All instrumentation wrapped in try/except so PostHog errors never
  interrupt the underlying Claude Agent SDK query
- 16 unit tests covering generation, multi-turn, fallback, tool spans,
  traces, privacy mode, personless mode, custom properties
- Example scripts (simple_query.py, instrument_reuse.py)
@andrewm4894 andrewm4894 self-assigned this Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

posthog-python Compliance Report

Date: 2026-04-01 12:25:25 UTC
Duration: 194ms

✅ All Tests Passed!

0/0 tests passed


Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5a8383a167

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 1, 2026

Comments Outside Diff (1)

  1. posthog/test/ai/claude_agent_sdk/test_processor.py, line 989-1067 (link)

    P2 Prefer parameterised tests over repeated near-identical test classes

    TestGenerationEmission, TestToolSpanEmission, TestTraceEmission, TestPrivacyMode, and TestCustomProperties all follow the same structure: build a messages list, patch original_query, consume the generator, then assert on a specific captured event type and its properties.

    Per the project's testing conventions, these should be collapsed into a single @pytest.mark.parametrize test. Each scenario becomes a parameter tuple of (messages, expected_event, expected_props), which keeps the mechanics in one place and makes it easy to add new event-type assertions without duplicating the patch/consume/filter boilerplate.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: posthog/test/ai/claude_agent_sdk/test_processor.py
    Line: 989-1067
    
    Comment:
    **Prefer parameterised tests over repeated near-identical test classes**
    
    `TestGenerationEmission`, `TestToolSpanEmission`, `TestTraceEmission`, `TestPrivacyMode`, and `TestCustomProperties` all follow the same structure: build a `messages` list, patch `original_query`, consume the generator, then assert on a specific captured event type and its properties.
    
    Per the project's testing conventions, these should be collapsed into a single `@pytest.mark.parametrize` test. Each scenario becomes a parameter tuple of `(messages, expected_event, expected_props)`, which keeps the mechanics in one place and makes it easy to add new event-type assertions without duplicating the patch/consume/filter boilerplate.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
This is a comment left during a code review.
Path: posthog/ai/claude_agent_sdk/processor.py
Line: 356-359

Comment:
**Per-call privacy override silently ignored for `$ai_input` / `$ai_output_choices`**

`_emit_generation` accepts a `privacy` parameter (line 337) that carries the per-call `posthog_privacy_mode` override, but then calls `self._with_privacy_mode()` which only consults the *instance-level* `self._privacy_mode`. The `privacy` argument is never used.

Compare to `_emit_tool_span` (line 451) which correctly gates the field on the local `privacy` flag:

```python
if not privacy and not (hasattr(self._client, "privacy_mode") and self._client.privacy_mode):
    properties["$ai_input_state"] = ...
```

As a result, calling `processor.query(posthog_privacy_mode=True)` will redact `$ai_input_state` in span events but will **not** redact `$ai_input` / `$ai_output_choices` in generation events. The same bug exists in `_emit_generation_from_result` (lines 405–408).

```suggestion
        if input_messages is not None:
            properties["$ai_input"] = None if privacy else self._with_privacy_mode(input_messages)
        if output_choices is not None:
            properties["$ai_output_choices"] = None if privacy else self._with_privacy_mode(output_choices)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: posthog/ai/claude_agent_sdk/processor.py
Line: 154-176

Comment:
**Fragile mutate-and-restore pattern for `self._groups` repeated four times**

`_capture_event` reads `self._groups` directly (line 173). To pass the correct per-call `groups` value, every emit helper saves, overwrites, and restores `self._groups`:

```python
saved_groups = self._groups
self._groups = groups
self._capture_event(...)
self._groups = saved_groups
```

This pattern is duplicated in `_emit_generation` (lines 369–372), `_emit_generation_from_result` (lines 423–426), `_emit_tool_span` (lines 457–460), and `_emit_trace` (lines 491–494). Any exception thrown between the assignment and the restore would leave `self._groups` in a corrupted state. Additionally, a shared processor used across two concurrent async tasks could observe the wrong groups value.

The clean fix is to give `_capture_event` a `groups` parameter and remove all four save/restore blocks:

```python
def _capture_event(
    self,
    event: str,
    properties: Dict[str, Any],
    distinct_id: Optional[str] = None,
    groups: Optional[Dict[str, Any]] = None,
) -> None:
    ...
    self._client.capture(
        distinct_id=distinct_id or "unknown",
        event=event,
        properties=final_properties,
        groups=groups if groups is not None else self._groups,
    )
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: posthog/test/ai/claude_agent_sdk/test_processor.py
Line: 989-1067

Comment:
**Prefer parameterised tests over repeated near-identical test classes**

`TestGenerationEmission`, `TestToolSpanEmission`, `TestTraceEmission`, `TestPrivacyMode`, and `TestCustomProperties` all follow the same structure: build a `messages` list, patch `original_query`, consume the generator, then assert on a specific captured event type and its properties.

Per the project's testing conventions, these should be collapsed into a single `@pytest.mark.parametrize` test. Each scenario becomes a parameter tuple of `(messages, expected_event, expected_props)`, which keeps the mechanics in one place and makes it easy to add new event-type assertions without duplicating the patch/consume/filter boilerplate.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat(ai): add Claude Agent SDK integrati..." | Re-trigger Greptile

- Honor per-call privacy override for $ai_input/$ai_output_choices
  in generation events (was only checking instance-level privacy)
- Pass groups directly to _capture_event instead of fragile
  save/restore pattern on self._groups (thread-safe, exception-safe)
- Fix tool span parent linkage: use tracker.current_span_id for
  in-progress generation instead of stale current_generation_span_id
andrewm4894 added a commit to PostHog/llm-analytics-apps that referenced this pull request Apr 1, 2026
Standalone script that tests posthog.ai.claude_agent_sdk integration.
Supports single-shot and interactive modes. Requires local posthog-python
with the claude_agent_sdk integration (PostHog/posthog-python#477).

Usage:
  uv pip install -e ../posthog-python
  uv run --no-sync scripts/test_claude_agent_sdk.py
  uv run --no-sync scripts/test_claude_agent_sdk.py --interactive
…tions

Wraps ClaudeSDKClient to instrument receive_response() with the same
generation/span/trace tracking as query(). Supports multi-turn
conversations with full history — each turn emits its own $ai_generation
events, all linked by a shared $ai_trace_id. The $ai_trace event is
emitted on disconnect() to cover the entire session.

Usage:
    async with PostHogClaudeSDKClient(options, posthog_client=ph) as client:
        await client.query("Hello")
        async for msg in client.receive_response():
            ...
        await client.query("Follow up")  # has conversation history
        async for msg in client.receive_response():
            ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant