feat(ai): add Claude Agent SDK integration for LLM analytics by andrewm4894 · Pull Request #477 · PostHog/posthog-python

andrewm4894 · 2026-04-01T11:12:10Z

Summary

Add posthog.ai.claude_agent_sdk module that wraps claude_agent_sdk.query() to automatically emit $ai_generation, $ai_span, and $ai_trace events
Two entry points: query() drop-in replacement and instrument() for configure-once reuse
Per-turn generation tracking via Anthropic StreamEvents with two-slot input tracking for correct tool result attribution
All instrumentation wrapped in try/except so PostHog errors never interrupt the underlying query
16 unit tests, example scripts, sampo changeset (minor bump)

How it works

The Claude Agent SDK has no TracingProcessor interface like OpenAI Agents SDK. Instead, this integration wraps the async streaming iterator from query(), enables include_partial_messages=True to receive raw Anthropic StreamEvents, and reconstructs per-turn $ai_generation events from message_start / message_stop boundaries. Tool uses emit $ai_span events, and ResultMessage triggers a $ai_trace with aggregate cost/latency.

Test plan

16 unit tests passing (uv run pytest posthog/test/ai/claude_agent_sdk/ -v)
Live tested against EU PostHog project — generations, spans, and traces visible with correct input/output, token counts, costs, and cache metrics
Multi-turn queries with tool calls (Read, Glob, Bash) produce correct event tree
CI passes

Add posthog.ai.claude_agent_sdk module that wraps claude_agent_sdk.query() to automatically emit $ai_generation, $ai_span, and $ai_trace events. - PostHogClaudeAgentProcessor with _GenerationTracker that reconstructs per-turn generation metrics from Anthropic StreamEvents - Two entry points: query() drop-in replacement and instrument() for configure-once reuse - Two-slot input tracking to correctly associate tool results with subsequent generations despite SDK message ordering - All instrumentation wrapped in try/except so PostHog errors never interrupt the underlying Claude Agent SDK query - 16 unit tests covering generation, multi-turn, fallback, tool spans, traces, privacy mode, personless mode, custom properties - Example scripts (simple_query.py, instrument_reuse.py)

github-actions · 2026-04-01T11:13:00Z

posthog-python Compliance Report

Date: 2026-04-01 12:25:25 UTC
Duration: 194ms

✅ All Tests Passed!

0/0 tests passed

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5a8383a167

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

posthog/ai/claude_agent_sdk/processor.py

greptile-apps · 2026-04-01T11:17:16Z

Comments Outside Diff (1)

posthog/test/ai/claude_agent_sdk/test_processor.py, line 989-1067 (link)

Prefer parameterised tests over repeated near-identical test classes

TestGenerationEmission, TestToolSpanEmission, TestTraceEmission, TestPrivacyMode, and TestCustomProperties all follow the same structure: build a messages list, patch original_query, consume the generator, then assert on a specific captured event type and its properties.

Per the project's testing conventions, these should be collapsed into a single @pytest.mark.parametrize test. Each scenario becomes a parameter tuple of (messages, expected_event, expected_props), which keeps the mechanics in one place and makes it easy to add new event-type assertions without duplicating the patch/consume/filter boilerplate.

Prompt To Fix With AI

This is a comment left during a code review.
Path: posthog/test/ai/claude_agent_sdk/test_processor.py
Line: 989-1067

Comment:
**Prefer parameterised tests over repeated near-identical test classes**

`TestGenerationEmission`, `TestToolSpanEmission`, `TestTraceEmission`, `TestPrivacyMode`, and `TestCustomProperties` all follow the same structure: build a `messages` list, patch `original_query`, consume the generator, then assert on a specific captured event type and its properties.

Per the project's testing conventions, these should be collapsed into a single `@pytest.mark.parametrize` test. Each scenario becomes a parameter tuple of `(messages, expected_event, expected_props)`, which keeps the mechanics in one place and makes it easy to add new event-type assertions without duplicating the patch/consume/filter boilerplate.

How can I resolve this? If you propose a fix, please make it concise.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix All With AI

This is a comment left during a code review.
Path: posthog/ai/claude_agent_sdk/processor.py
Line: 356-359

Comment:
**Per-call privacy override silently ignored for `$ai_input` / `$ai_output_choices`**

`_emit_generation` accepts a `privacy` parameter (line 337) that carries the per-call `posthog_privacy_mode` override, but then calls `self._with_privacy_mode()` which only consults the *instance-level* `self._privacy_mode`. The `privacy` argument is never used.

Compare to `_emit_tool_span` (line 451) which correctly gates the field on the local `privacy` flag:

```python
if not privacy and not (hasattr(self._client, "privacy_mode") and self._client.privacy_mode):
    properties["$ai_input_state"] = ...
```

As a result, calling `processor.query(posthog_privacy_mode=True)` will redact `$ai_input_state` in span events but will **not** redact `$ai_input` / `$ai_output_choices` in generation events. The same bug exists in `_emit_generation_from_result` (lines 405–408).

```suggestion
        if input_messages is not None:
            properties["$ai_input"] = None if privacy else self._with_privacy_mode(input_messages)
        if output_choices is not None:
            properties["$ai_output_choices"] = None if privacy else self._with_privacy_mode(output_choices)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: posthog/ai/claude_agent_sdk/processor.py
Line: 154-176

Comment:
**Fragile mutate-and-restore pattern for `self._groups` repeated four times**

`_capture_event` reads `self._groups` directly (line 173). To pass the correct per-call `groups` value, every emit helper saves, overwrites, and restores `self._groups`:

```python
saved_groups = self._groups
self._groups = groups
self._capture_event(...)
self._groups = saved_groups
```

This pattern is duplicated in `_emit_generation` (lines 369–372), `_emit_generation_from_result` (lines 423–426), `_emit_tool_span` (lines 457–460), and `_emit_trace` (lines 491–494). Any exception thrown between the assignment and the restore would leave `self._groups` in a corrupted state. Additionally, a shared processor used across two concurrent async tasks could observe the wrong groups value.

The clean fix is to give `_capture_event` a `groups` parameter and remove all four save/restore blocks:

```python
def _capture_event(
    self,
    event: str,
    properties: Dict[str, Any],
    distinct_id: Optional[str] = None,
    groups: Optional[Dict[str, Any]] = None,
) -> None:
    ...
    self._client.capture(
        distinct_id=distinct_id or "unknown",
        event=event,
        properties=final_properties,
        groups=groups if groups is not None else self._groups,
    )
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: posthog/test/ai/claude_agent_sdk/test_processor.py
Line: 989-1067

Comment:
**Prefer parameterised tests over repeated near-identical test classes**

`TestGenerationEmission`, `TestToolSpanEmission`, `TestTraceEmission`, `TestPrivacyMode`, and `TestCustomProperties` all follow the same structure: build a `messages` list, patch `original_query`, consume the generator, then assert on a specific captured event type and its properties.

Per the project's testing conventions, these should be collapsed into a single `@pytest.mark.parametrize` test. Each scenario becomes a parameter tuple of `(messages, expected_event, expected_props)`, which keeps the mechanics in one place and makes it easy to add new event-type assertions without duplicating the patch/consume/filter boilerplate.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "feat(ai): add Claude Agent SDK integrati..." | Re-trigger Greptile}

posthog/ai/claude_agent_sdk/processor.py

- Honor per-call privacy override for $ai_input/$ai_output_choices in generation events (was only checking instance-level privacy) - Pass groups directly to _capture_event instead of fragile save/restore pattern on self._groups (thread-safe, exception-safe) - Fix tool span parent linkage: use tracker.current_span_id for in-progress generation instead of stale current_generation_span_id

Standalone script that tests posthog.ai.claude_agent_sdk integration. Supports single-shot and interactive modes. Requires local posthog-python with the claude_agent_sdk integration (PostHog/posthog-python#477). Usage: uv pip install -e ../posthog-python uv run --no-sync scripts/test_claude_agent_sdk.py uv run --no-sync scripts/test_claude_agent_sdk.py --interactive

…tions Wraps ClaudeSDKClient to instrument receive_response() with the same generation/span/trace tracking as query(). Supports multi-turn conversations with full history — each turn emits its own $ai_generation events, all linked by a shared $ai_trace_id. The $ai_trace event is emitted on disconnect() to cover the entire session. Usage: async with PostHogClaudeSDKClient(options, posthog_client=ph) as client: await client.query("Hello") async for msg in client.receive_response(): ... await client.query("Follow up") # has conversation history async for msg in client.receive_response(): ...

andrewm4894 self-assigned this Apr 1, 2026

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

posthog/ai/claude_agent_sdk/processor.py Outdated Show resolved Hide resolved

posthog/ai/claude_agent_sdk/processor.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Apr 1, 2026

View reviewed changes

posthog/ai/claude_agent_sdk/processor.py Outdated Show resolved Hide resolved

posthog/ai/claude_agent_sdk/processor.py Show resolved Hide resolved

andrewm4894 mentioned this pull request Apr 1, 2026

feat(llma): use PostHog SDK integration for stamphog LLM analytics PostHog/posthog#53008

Open

andrewm4894 requested a review from a team April 1, 2026 11:20

andrewm4894 added 2 commits April 1, 2026 12:23

style: ruff format claude_agent_sdk files

61cdeb5

andrewm4894 mentioned this pull request Apr 1, 2026

chore(llma): Add PostHog LLM Analytics instrumentation to PR approval agent PostHog/posthog#52984

Closed

andrewm4894 added 3 commits April 1, 2026 13:05

docs(ai): add multi-turn conversation example for Claude Agent SDK

794a8f7

refactor(ai): split PostHogClaudeSDKClient into client.py

4916988

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): add Claude Agent SDK integration for LLM analytics#477

feat(ai): add Claude Agent SDK integration for LLM analytics#477
andrewm4894 wants to merge 6 commits intomasterfrom
feat/claude-agent-sdk-integration

andrewm4894 commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Apr 1, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andrewm4894 commented Apr 1, 2026

Summary

How it works

Test plan

Uh oh!

github-actions bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

posthog-python Compliance Report

✅ All Tests Passed!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Apr 1, 2026 •

edited

Loading

greptile-apps bot commented Apr 1, 2026 •

edited

Loading