Skip to content

[feat] Extend loadables#3811

Open
jp-agenta wants to merge 20 commits intomainfrom
feat/extend-loadables-in-api
Open

[feat] Extend loadables#3811
jp-agenta wants to merge 20 commits intomainfrom
feat/extend-loadables-in-api

Conversation

@jp-agenta
Copy link
Member

@jp-agenta jp-agenta commented Feb 23, 2026

PR: Loadables Retrieval Alignment

Summary

This PR aligns loadables retrieval behavior across testsets/queries, removes stale design drift in docs, and makes the traces router a first-class traces API (no tracing-shaped response types in traces endpoints).

Change Inventory

API: Testsets

  1. Fixed _populate_testcases(...) call-site argument binding bugs by switching to keyword arguments.
  2. Added deterministic ID-level windowing support for testset revision retrieval (order, next, limit) when enumerating testcase_ids.
  3. Kept revision retrieval semantics where include_testcases=true returns both testcases and testcase_ids.
  4. Updated /preview/testsets/revisions/retrieve caching policy to cache only when include_testcases=false.

API: Queries

  1. Extended query revision population to merge request pagination (next, limit) into stored windowing.
  2. Kept revision retrieval semantics where trace expansion can return both traces and trace_ids.
  3. Updated /preview/queries/revisions/retrieve caching policy to cache only when both include_trace_ids=false and include_traces=false.
  4. Added permission coupling: query revision retrieve with trace expansion requires trace-view permission in addition to query-view permission.

API: Traces Router

  1. Introduced traces-specific DTOs:
    • TraceResponse
    • TracesResponse
    • TracesQueryRequest
  2. Removed formatting from TracesQueryRequest.
  3. Forced /preview/traces/query to always return Agenta trace trees (never spans/opentelemetry formatting).
  4. Removed external TracingQuery request contract from TracesRouter.query_traces; traces endpoint now consumes only TracesQueryRequest.
  5. Added query permission coupling for ref-dereferenced traces query (query_ref, query_variant_ref, query_revision_ref).
  6. Added native traces router fetch handler (GET /preview/traces/{trace_id}) returning TraceResponse.

Docs

  1. Updated docs/designs/loadables/loadables.querying.strategies.md to include:
    • include-flag defaults
    • conditional caching behavior
    • permission coupling notes
    • windowing.next terminology
  2. Updated docs/designs/loadables/loadables.initial.specs.md examples from cursor to next.
  3. Removed redundant docs/designs/loadables/loadables.querying.gap-analysis.md after consolidating its useful content into the strategies document.

Behavior Summary

  1. Revision endpoints remain the control surface for ID/item expansion.
  2. Record endpoints (/preview/testcases, /preview/traces) remain record-returning endpoints without extra top-level ID arrays.
  3. Traces router now has a clean traces-only contract and response types.

Validation

  1. Formatting/lint:
    • cd api && ruff format && ruff check --fix
  2. Targeted e2e:
    • pytest -q oss/tests/pytest/e2e/tracing/test_traces_basics.py oss/tests/pytest/e2e/loadables/test_loadable_strategies.py
    • Result: 27 passed
  3. Broader e2e:
    • pytest -q oss/tests/pytest/e2e/testsets/test_testsets_basics.py oss/tests/pytest/e2e/testsets/test_testsets_queries.py oss/tests/pytest/e2e/testsets/test_testcases_basics.py oss/tests/pytest/e2e/tracing/test_spans_basics.py oss/tests/pytest/e2e/tracing/test_spans_queries.py
    • Result: 17 passed, 3 skipped (existing flaky skips)
  4. Full e2e suite:
    • pytest -q oss/tests/pytest/e2e
    • Result: 175 passed, 3 skipped

Open with Devin

@vercel
Copy link

vercel bot commented Feb 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Feb 27, 2026 10:14am

Request Review

@jp-agenta jp-agenta requested a review from ardaerzin February 23, 2026 19:36
@jp-agenta jp-agenta requested a review from mmabrouk February 23, 2026 20:06
@jp-agenta jp-agenta marked this pull request as ready for review February 23, 2026 20:07
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. Backend documentation Improvements or additions to documentation labels Feb 23, 2026
@jp-agenta
Copy link
Member Author

jp-agenta commented Feb 23, 2026

@ardaerzin the files that matters most:

  • docs/designs/loadables/loadables.querying.strategies.md
  • docs/designs/loadables/loadables.endpoints.specs.md

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

View 8 additional findings in Devin Review.

Open in Devin Review

Comment on lines 976 to 984
self.router.add_api_route(
"/{trace_id}",
self.fetch_trace,
methods=["GET"],
operation_id="fetch_trace",
status_code=status.HTTP_200_OK,
response_model=TraceResponse,
response_model_exclude_none=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 TracesRouter registers GET / and GET /{trace_id} before POST /query, causing GET /query to be routed to fetch_trace handler

In TracesRouter.__init__, the route /{trace_id} (GET) is registered at line 977 before /query (POST) at line 987. In FastAPI, /{trace_id} is a path-parameter route that matches any single path segment.

Route Shadowing Details

While /query and /ingest are POST endpoints (so they don't conflict with the GET /{trace_id}), there's a subtle issue: if a client mistakenly sends GET /preview/traces/query or GET /preview/traces/ingest, FastAPI will match these against /{trace_id} with trace_id="query" or trace_id="ingest" respectively. The fetch_trace handler will then attempt to parse "query" or "ingest" as a trace UUID, fail, and return a suppressed empty TraceResponse() (due to @suppress_exceptions).

This means route-not-found errors are silently swallowed instead of returning a proper 404/405 Method Not Allowed. More importantly, GET /preview/traces/query silently returns {"count": 0} instead of a 405, which could confuse API consumers.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

Railway Preview Environment

Status Destroyed (PR converted to draft)

Updated at 2026-02-24T14:18:53.560Z

Copy link
Member

@mmabrouk mmabrouk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jp-agenta lgtm

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 23, 2026
@jp-agenta jp-agenta changed the title [chore] Extend loadables in api [feat] Extend loadables in api Feb 24, 2026
@jp-agenta jp-agenta marked this pull request as draft February 24, 2026 14:18
@jp-agenta jp-agenta changed the title [feat] Extend loadables in api [feat] Extend loadables Feb 26, 2026
@junaway junaway marked this pull request as ready for review February 27, 2026 08:35
Copilot AI review requested due to automatic review settings February 27, 2026 08:35
@dosubot dosubot bot added the feature label Feb 27, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the loadables feature by aligning retrieval behavior across testsets and queries, cleaning up the tracing architecture, and introducing clean trace-specific DTOs and endpoints. The changes implement three strategies for retrieving loadable items (by content, by IDs, or by full items with pagination), separate concerns between routers/services/workers, and consolidate tracing type definitions in the SDK.

Changes:

  • Extended testset and query revision retrieval with include_*_ids and include_* flags for controlled expansion
  • Introduced dedicated traces router (/preview/traces/*) with trace-specific DTOs (TraceResponse, TracesResponse, TracesQueryRequest)
  • Refactored tracing utilities from monolithic module into organized subpackage (attributes, parsing, trees, filtering, hashing, simple_traces)
  • Moved canonical tracing types to SDK (agenta.sdk.models.tracing) and re-exported in API core
  • Removed test that was deleted (test_observability_trace_lifecycle)
  • Updated documentation with detailed loadables querying strategies

Reviewed changes

Copilot reviewed 71 out of 74 changed files in this pull request and generated no comments.

Show a summary per file
File Description
sdk/agenta/sdk/models/tracing.py Consolidated tracing type definitions with backward-compatible aliases
api/oss/src/core/tracing/dtos.py Re-exports SDK tracing types, adds query exceptions
api/oss/src/core/queries/service.py Added _populate_traces() for query revision trace expansion
api/oss/src/apis/fastapi/tracing/models.py New trace/span-specific request/response models
api/oss/src/core/tracing/utils/* Split monolithic utils into organized submodules
docs/designs/loadables/*.md Comprehensive loadables design documentation
Multiple test files New unit tests for tracing utilities
EE organization/workspace files Type consolidation into core modules

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 14 additional findings in Devin Review.

Open in Devin Review

Comment on lines +493 to +503
def get_span_from_trace(trace: Optional[Trace], span_id: str) -> Optional[Span]:
if not trace or not trace.spans:
return None
for span in trace.spans.values():
if isinstance(span, list):
for item in span:
if item and item.span_id == span_id:
return item
elif span and span.span_id == span_id:
return span
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 get_span_from_trace only searches top-level spans, missing nested children

The get_span_from_trace function at api/oss/src/core/tracing/utils/trees.py:493-503 only iterates over the top-level trace.spans values. It does not recurse into nested span.spans children. This function is used by SpansRouter.fetch_span at api/oss/src/apis/fastapi/tracing/router.py:970-976 to serve GET /preview/spans/{trace_id}/{span_id}.

Root Cause

The Agenta trace tree format nests child spans inside parent spans via the spans field (e.g., root.spans.child_span). The get_span_from_trace function only checks trace.spans.values() — the top-level entries — and never recurses into span.spans for children:

def get_span_from_trace(trace, span_id):
    for span in trace.spans.values():  # only top-level
        if isinstance(span, list):
            for item in span:
                if item and item.span_id == span_id:
                    return item
        elif span and span.span_id == span_id:
            return span
    return None  # child spans are never found

Impact: GET /preview/spans/{trace_id}/{span_id} returns null for any child span (non-root span), even though the span exists in the trace. Only root-level spans can be fetched by this endpoint.

Suggested change
def get_span_from_trace(trace: Optional[Trace], span_id: str) -> Optional[Span]:
if not trace or not trace.spans:
return None
for span in trace.spans.values():
if isinstance(span, list):
for item in span:
if item and item.span_id == span_id:
return item
elif span and span.span_id == span_id:
return span
return None
def get_span_from_trace(trace: Optional[Trace], span_id: str) -> Optional[Span]:
if not trace or not trace.spans:
return None
def _search(spans_dict):
for span in spans_dict.values():
if isinstance(span, list):
for item in span:
if item and item.span_id == span_id:
return item
if item and hasattr(item, 'spans') and item.spans:
found = _search(item.spans)
if found:
return found
else:
if span and span.span_id == span_id:
return span
if span and hasattr(span, 'spans') and span.spans:
found = _search(span.spans)
if found:
return found
return None
return _search(trace.spans)
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 16 additional findings in Devin Review.

Open in Devin Review

}


def parse_simple_trace(trace: Optional[OTelTraceTree]) -> Optional[ParsedSimpleTrace]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 parse_simple_trace receives Trace object but type-annotated and coded for OTelTraceTree (dict), causing AttributeError

After the refactor, fetch_trace returns Optional[Trace] (a Pydantic model with .trace_id and .spans), but parse_simple_trace is annotated as accepting Optional[OTelTraceTree] which is Dict[str, OTelSpansTree]. Internally it calls extract_root_span(trace) which does trace.spans — this works on the Trace model. However, the callers in annotations/service.py and invocations/service.py pass the Trace object directly to parse_simple_trace, while the old code passed an OTelSpansTree (which also had .spans). The type annotation is wrong but the runtime behavior happens to work because both Trace and OTelSpansTree have a .spans attribute.

However, query_traces in annotations/service.py:758 and invocations/service.py:674 returns Traces (a List[Trace]), and the code iterates over it passing each Trace to parse_simple_trace. The old code iterated over list(spans_response.traces.values()) which yielded OTelSpansTree objects. Since Trace extends SpansTree (which is OTelSpansTree), the .spans attribute is present and extract_root_span works correctly at runtime. The type annotation mismatch is misleading but not a runtime bug.

Analysis

This is a type annotation inconsistency rather than a runtime bug. Trace(TraceID, SpansTree) inherits .spans from SpansTree, which is the same as OTelSpansTree. So extract_root_span at simple_traces.py:92-102 will work correctly because trace.spans resolves properly on both types.

Suggested change
def parse_simple_trace(trace: Optional[OTelTraceTree]) -> Optional[ParsedSimpleTrace]:
def parse_simple_trace(trace: Optional[Union[OTelTraceTree, "Trace"]]) -> Optional[ParsedSimpleTrace]:
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +965 to +969
include_trace_ids_off = (
query_revision_retrieve_request.include_trace_ids is not True
)
include_traces_off = query_revision_retrieve_request.include_traces is not True
should_cache = include_trace_ids_off and include_traces_off
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Asymmetric caching condition: testset should_cache uses is False while query uses is not True, causing different behavior for None defaults

The testset retrieve caching condition at testsets/router.py:1367-1373 uses is False checks:

include_testcase_ids_off = (testset_revision_retrieve_request.include_testcase_ids is False)
include_testcases_off = (testset_revision_retrieve_request.include_testcases is False)

The query retrieve caching condition at queries/router.py:965-969 uses is not True checks:

include_trace_ids_off = (query_revision_retrieve_request.include_trace_ids is not True)
include_traces_off = (query_revision_retrieve_request.include_traces is not True)
Root Cause and Impact

For testsets, is False means caching only happens when the flag is explicitly set to False. When the flag is None (default), None is False evaluates to False, so no caching occurs on default requests. This is correct for testsets since the defaults include data.

For queries, is not True means caching happens when the flag is None (default) or False. None is not True evaluates to True, so caching occurs on default requests. This is correct for queries since the defaults exclude trace data.

While the behavior is intentionally asymmetric to match the different default semantics (testset defaults = include everything; query defaults = include nothing), the code uses two different comparison idioms to express this. This is not a runtime bug but is confusing and fragile — a future maintainer might "fix" one to match the other, breaking caching behavior.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 18 additional findings in Devin Review.

Open in Devin Review

return specs_body or []
if not specs_body:
return specs_params or []
return []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 merge_specs silently discards user-provided analytics specs when both params and body are provided

When both specs_params and specs_body are truthy (i.e., the user provides custom metric specs in both query parameters and the request body), merge_specs returns an empty list []. The caller merge_analytics then replaces this empty list with hardcoded defaults, silently ignoring the user's custom spec selections.

Root Cause and Impact

The logic at api/oss/src/core/tracing/service.py:316-322 handles the "both provided" case by returning []:

if not specs_params:
    return specs_body or []
if not specs_body:
    return specs_params or []
return []  # Both provided → empty

Then in merge_analytics at line 339: if not specs: specs = cls.default_analytics_specs() — the empty list triggers the default fallback.

This is inconsistent with merge_queries which prefers body over params when both are present. The expected behavior should be return specs_body (body wins) instead of return [].

Impact: Any client providing analytics metric specs via both query params and request body will have both sets of specs silently ignored and replaced with system defaults. The /preview/tracing/analytics/query endpoint is affected.

Suggested change
return []
return specs_body or specs_params or []
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend documentation Improvements or additions to documentation feature lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants