Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@ardaerzin the files that matters most:
|
| self.router.add_api_route( | ||
| "/{trace_id}", | ||
| self.fetch_trace, | ||
| methods=["GET"], | ||
| operation_id="fetch_trace", | ||
| status_code=status.HTTP_200_OK, | ||
| response_model=TraceResponse, | ||
| response_model_exclude_none=True, | ||
| ) |
There was a problem hiding this comment.
🟡 TracesRouter registers GET / and GET /{trace_id} before POST /query, causing GET /query to be routed to fetch_trace handler
In TracesRouter.__init__, the route /{trace_id} (GET) is registered at line 977 before /query (POST) at line 987. In FastAPI, /{trace_id} is a path-parameter route that matches any single path segment.
Route Shadowing Details
While /query and /ingest are POST endpoints (so they don't conflict with the GET /{trace_id}), there's a subtle issue: if a client mistakenly sends GET /preview/traces/query or GET /preview/traces/ingest, FastAPI will match these against /{trace_id} with trace_id="query" or trace_id="ingest" respectively. The fetch_trace handler will then attempt to parse "query" or "ingest" as a trace UUID, fail, and return a suppressed empty TraceResponse() (due to @suppress_exceptions).
This means route-not-found errors are silently swallowed instead of returning a proper 404/405 Method Not Allowed. More importantly, GET /preview/traces/query silently returns {"count": 0} instead of a 405, which could confuse API consumers.
Was this helpful? React with 👍 or 👎 to provide feedback.
Railway Preview Environment
Updated at 2026-02-24T14:18:53.560Z |
loadables in apiloadables in api
There was a problem hiding this comment.
Pull request overview
This PR extends the loadables feature by aligning retrieval behavior across testsets and queries, cleaning up the tracing architecture, and introducing clean trace-specific DTOs and endpoints. The changes implement three strategies for retrieving loadable items (by content, by IDs, or by full items with pagination), separate concerns between routers/services/workers, and consolidate tracing type definitions in the SDK.
Changes:
- Extended testset and query revision retrieval with
include_*_idsandinclude_*flags for controlled expansion - Introduced dedicated traces router (
/preview/traces/*) with trace-specific DTOs (TraceResponse,TracesResponse,TracesQueryRequest) - Refactored tracing utilities from monolithic module into organized subpackage (attributes, parsing, trees, filtering, hashing, simple_traces)
- Moved canonical tracing types to SDK (
agenta.sdk.models.tracing) and re-exported in API core - Removed test that was deleted (
test_observability_trace_lifecycle) - Updated documentation with detailed loadables querying strategies
Reviewed changes
Copilot reviewed 71 out of 74 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/agenta/sdk/models/tracing.py | Consolidated tracing type definitions with backward-compatible aliases |
| api/oss/src/core/tracing/dtos.py | Re-exports SDK tracing types, adds query exceptions |
| api/oss/src/core/queries/service.py | Added _populate_traces() for query revision trace expansion |
| api/oss/src/apis/fastapi/tracing/models.py | New trace/span-specific request/response models |
| api/oss/src/core/tracing/utils/* | Split monolithic utils into organized submodules |
| docs/designs/loadables/*.md | Comprehensive loadables design documentation |
| Multiple test files | New unit tests for tracing utilities |
| EE organization/workspace files | Type consolidation into core modules |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def get_span_from_trace(trace: Optional[Trace], span_id: str) -> Optional[Span]: | ||
| if not trace or not trace.spans: | ||
| return None | ||
| for span in trace.spans.values(): | ||
| if isinstance(span, list): | ||
| for item in span: | ||
| if item and item.span_id == span_id: | ||
| return item | ||
| elif span and span.span_id == span_id: | ||
| return span | ||
| return None |
There was a problem hiding this comment.
🔴 get_span_from_trace only searches top-level spans, missing nested children
The get_span_from_trace function at api/oss/src/core/tracing/utils/trees.py:493-503 only iterates over the top-level trace.spans values. It does not recurse into nested span.spans children. This function is used by SpansRouter.fetch_span at api/oss/src/apis/fastapi/tracing/router.py:970-976 to serve GET /preview/spans/{trace_id}/{span_id}.
Root Cause
The Agenta trace tree format nests child spans inside parent spans via the spans field (e.g., root.spans.child_span). The get_span_from_trace function only checks trace.spans.values() — the top-level entries — and never recurses into span.spans for children:
def get_span_from_trace(trace, span_id):
for span in trace.spans.values(): # only top-level
if isinstance(span, list):
for item in span:
if item and item.span_id == span_id:
return item
elif span and span.span_id == span_id:
return span
return None # child spans are never foundImpact: GET /preview/spans/{trace_id}/{span_id} returns null for any child span (non-root span), even though the span exists in the trace. Only root-level spans can be fetched by this endpoint.
| def get_span_from_trace(trace: Optional[Trace], span_id: str) -> Optional[Span]: | |
| if not trace or not trace.spans: | |
| return None | |
| for span in trace.spans.values(): | |
| if isinstance(span, list): | |
| for item in span: | |
| if item and item.span_id == span_id: | |
| return item | |
| elif span and span.span_id == span_id: | |
| return span | |
| return None | |
| def get_span_from_trace(trace: Optional[Trace], span_id: str) -> Optional[Span]: | |
| if not trace or not trace.spans: | |
| return None | |
| def _search(spans_dict): | |
| for span in spans_dict.values(): | |
| if isinstance(span, list): | |
| for item in span: | |
| if item and item.span_id == span_id: | |
| return item | |
| if item and hasattr(item, 'spans') and item.spans: | |
| found = _search(item.spans) | |
| if found: | |
| return found | |
| else: | |
| if span and span.span_id == span_id: | |
| return span | |
| if span and hasattr(span, 'spans') and span.spans: | |
| found = _search(span.spans) | |
| if found: | |
| return found | |
| return None | |
| return _search(trace.spans) |
Was this helpful? React with 👍 or 👎 to provide feedback.
| } | ||
|
|
||
|
|
||
| def parse_simple_trace(trace: Optional[OTelTraceTree]) -> Optional[ParsedSimpleTrace]: |
There was a problem hiding this comment.
🟡 parse_simple_trace receives Trace object but type-annotated and coded for OTelTraceTree (dict), causing AttributeError
After the refactor, fetch_trace returns Optional[Trace] (a Pydantic model with .trace_id and .spans), but parse_simple_trace is annotated as accepting Optional[OTelTraceTree] which is Dict[str, OTelSpansTree]. Internally it calls extract_root_span(trace) which does trace.spans — this works on the Trace model. However, the callers in annotations/service.py and invocations/service.py pass the Trace object directly to parse_simple_trace, while the old code passed an OTelSpansTree (which also had .spans). The type annotation is wrong but the runtime behavior happens to work because both Trace and OTelSpansTree have a .spans attribute.
However, query_traces in annotations/service.py:758 and invocations/service.py:674 returns Traces (a List[Trace]), and the code iterates over it passing each Trace to parse_simple_trace. The old code iterated over list(spans_response.traces.values()) which yielded OTelSpansTree objects. Since Trace extends SpansTree (which is OTelSpansTree), the .spans attribute is present and extract_root_span works correctly at runtime. The type annotation mismatch is misleading but not a runtime bug.
Analysis
This is a type annotation inconsistency rather than a runtime bug. Trace(TraceID, SpansTree) inherits .spans from SpansTree, which is the same as OTelSpansTree. So extract_root_span at simple_traces.py:92-102 will work correctly because trace.spans resolves properly on both types.
| def parse_simple_trace(trace: Optional[OTelTraceTree]) -> Optional[ParsedSimpleTrace]: | |
| def parse_simple_trace(trace: Optional[Union[OTelTraceTree, "Trace"]]) -> Optional[ParsedSimpleTrace]: |
Was this helpful? React with 👍 or 👎 to provide feedback.
| include_trace_ids_off = ( | ||
| query_revision_retrieve_request.include_trace_ids is not True | ||
| ) | ||
| include_traces_off = query_revision_retrieve_request.include_traces is not True | ||
| should_cache = include_trace_ids_off and include_traces_off |
There was a problem hiding this comment.
🟡 Asymmetric caching condition: testset should_cache uses is False while query uses is not True, causing different behavior for None defaults
The testset retrieve caching condition at testsets/router.py:1367-1373 uses is False checks:
include_testcase_ids_off = (testset_revision_retrieve_request.include_testcase_ids is False)
include_testcases_off = (testset_revision_retrieve_request.include_testcases is False)The query retrieve caching condition at queries/router.py:965-969 uses is not True checks:
include_trace_ids_off = (query_revision_retrieve_request.include_trace_ids is not True)
include_traces_off = (query_revision_retrieve_request.include_traces is not True)Root Cause and Impact
For testsets, is False means caching only happens when the flag is explicitly set to False. When the flag is None (default), None is False evaluates to False, so no caching occurs on default requests. This is correct for testsets since the defaults include data.
For queries, is not True means caching happens when the flag is None (default) or False. None is not True evaluates to True, so caching occurs on default requests. This is correct for queries since the defaults exclude trace data.
While the behavior is intentionally asymmetric to match the different default semantics (testset defaults = include everything; query defaults = include nothing), the code uses two different comparison idioms to express this. This is not a runtime bug but is confusing and fragile — a future maintainer might "fix" one to match the other, breaking caching behavior.
Was this helpful? React with 👍 or 👎 to provide feedback.
| return specs_body or [] | ||
| if not specs_body: | ||
| return specs_params or [] | ||
| return [] |
There was a problem hiding this comment.
🟡 merge_specs silently discards user-provided analytics specs when both params and body are provided
When both specs_params and specs_body are truthy (i.e., the user provides custom metric specs in both query parameters and the request body), merge_specs returns an empty list []. The caller merge_analytics then replaces this empty list with hardcoded defaults, silently ignoring the user's custom spec selections.
Root Cause and Impact
The logic at api/oss/src/core/tracing/service.py:316-322 handles the "both provided" case by returning []:
if not specs_params:
return specs_body or []
if not specs_body:
return specs_params or []
return [] # Both provided → emptyThen in merge_analytics at line 339: if not specs: specs = cls.default_analytics_specs() — the empty list triggers the default fallback.
This is inconsistent with merge_queries which prefers body over params when both are present. The expected behavior should be return specs_body (body wins) instead of return [].
Impact: Any client providing analytics metric specs via both query params and request body will have both sets of specs silently ignored and replaced with system defaults. The /preview/tracing/analytics/query endpoint is affected.
| return [] | |
| return specs_body or specs_params or [] |
Was this helpful? React with 👍 or 👎 to provide feedback.
PR: Loadables Retrieval Alignment
Summary
This PR aligns loadables retrieval behavior across testsets/queries, removes stale design drift in docs, and makes the traces router a first-class traces API (no tracing-shaped response types in traces endpoints).
Change Inventory
API: Testsets
_populate_testcases(...)call-site argument binding bugs by switching to keyword arguments.order,next,limit) when enumeratingtestcase_ids.include_testcases=truereturns bothtestcasesandtestcase_ids./preview/testsets/revisions/retrievecaching policy to cache only wheninclude_testcases=false.API: Queries
next,limit) into stored windowing.tracesandtrace_ids./preview/queries/revisions/retrievecaching policy to cache only when bothinclude_trace_ids=falseandinclude_traces=false.API: Traces Router
TraceResponseTracesResponseTracesQueryRequestformattingfromTracesQueryRequest./preview/traces/queryto always return Agenta trace trees (never spans/opentelemetry formatting).TracingQueryrequest contract fromTracesRouter.query_traces; traces endpoint now consumes onlyTracesQueryRequest.query_ref,query_variant_ref,query_revision_ref).GET /preview/traces/{trace_id}) returningTraceResponse.Docs
docs/designs/loadables/loadables.querying.strategies.mdto include:windowing.nextterminologydocs/designs/loadables/loadables.initial.specs.mdexamples fromcursortonext.docs/designs/loadables/loadables.querying.gap-analysis.mdafter consolidating its useful content into the strategies document.Behavior Summary
/preview/testcases,/preview/traces) remain record-returning endpoints without extra top-level ID arrays.Validation
cd api && ruff format && ruff check --fixpytest -q oss/tests/pytest/e2e/tracing/test_traces_basics.py oss/tests/pytest/e2e/loadables/test_loadable_strategies.py27 passedpytest -q oss/tests/pytest/e2e/testsets/test_testsets_basics.py oss/tests/pytest/e2e/testsets/test_testsets_queries.py oss/tests/pytest/e2e/testsets/test_testcases_basics.py oss/tests/pytest/e2e/tracing/test_spans_basics.py oss/tests/pytest/e2e/tracing/test_spans_queries.py17 passed, 3 skipped(existing flaky skips)pytest -q oss/tests/pytest/e2e175 passed, 3 skipped