Skip to content

feat: optimize tool responses to reduce token consumption#71

Open
AdamGustavsson wants to merge 5 commits intogoogleanalytics:mainfrom
AdamGustavsson:optimize-token-usage
Open

feat: optimize tool responses to reduce token consumption#71
AdamGustavsson wants to merge 5 commits intogoogleanalytics:mainfrom
AdamGustavsson:optimize-token-usage

Conversation

@AdamGustavsson
Copy link
Copy Markdown
Contributor

MCP tool responses are directly consumed by LLMs, making token count a critical factor. This commit significantly reduces token usage while maintaining full functionality.

Token Optimization Strategy:

  • Eliminate repetition in array responses (property_type, parent fields)
  • Use compact row format with simple arrays vs wrapped objects
  • Conditionally include fields only when populated (metadata, totals, etc.)
  • Strip redundant parent resource names
  • Add optional include_descriptions parameter (default: false)

Response Format Changes:

  • get_account_summaries: Return compact format with simple IDs Savings: ~40% (eliminates repeated property_type/parent for each property)
  • run_report/run_realtime_report: Compact rows, conditional field inclusion Savings: ~30-50% (less wrapper objects, no empty fields)
  • get_custom_dimensions_and_metrics: Cleaner field names, optional descriptions Savings: ~25% (descriptions excluded by default)

Schema Simplification:

  • Change property_id parameters to accept only numeric strings (e.g. '213025502')
  • Remove support for full resource names ('properties/12345')
  • This creates consistency: tools return IDs that other tools accept
  • Update construct_property_rn() to enforce numeric string format only

Additional Improvements:

  • Add default limit=100 to run_report and run_realtime_report Prevents accidentally requesting massive responses
  • Add automatic quota warning when API usage exceeds 90% Helps prevent hitting quota limits unexpectedly
  • Fix bug: Remove offset parameter from run_realtime_report The Realtime API doesn't support pagination via offset. Attempting to use offset results in 'Unknown field for RunRealtimeReportRequest: offset' error.

Tests:

  • Add comprehensive quota warning tests (6 test cases)
  • Update construct_property_rn validation tests
  • All tests passing

Breaking Changes:

  • property_id parameters now require numeric strings only
  • Response formats are more compact (but contain same data)
  • Custom dimensions/metrics return different field names (api_name vs apiName)
  • run_realtime_report no longer accepts offset parameter

Fixes #69 and similar issues for other tool calls

MCP tool responses are directly consumed by LLMs, making token count a critical
cost factor. This commit significantly reduces token usage while maintaining
full functionality.

Token Optimization Strategy:
- Eliminate repetition in array responses (property_type, parent fields)
- Use compact row format with simple arrays vs wrapped objects
- Conditionally include fields only when populated (metadata, totals, etc.)
- Strip redundant parent resource names
- Add optional include_descriptions parameter (default: false)

Response Format Changes:
- get_account_summaries: Return compact format with simple IDs
  Savings: ~40% (eliminates repeated property_type/parent for each property)
- run_report/run_realtime_report: Compact rows, conditional field inclusion
  Savings: ~30-50% (less wrapper objects, no empty fields)
- get_custom_dimensions_and_metrics: Cleaner field names, optional descriptions
  Savings: ~25% (descriptions excluded by default)

Schema Simplification:
- Change property_id parameters to accept only numeric strings (e.g. '213025502')
- Remove support for full resource names ('properties/12345')
- This creates consistency: tools return IDs that other tools accept
- Update construct_property_rn() to enforce numeric string format only

Additional Improvements:
- Add default limit=100 to run_report and run_realtime_report
  Prevents accidentally requesting massive responses
- Add automatic quota warning when API usage exceeds 90%
  Helps prevent hitting quota limits unexpectedly
- Fix bug: Remove offset parameter from run_realtime_report
  The Realtime API doesn't support pagination via offset. Attempting to use
  offset results in 'Unknown field for RunRealtimeReportRequest: offset' error.

Tests:
- Add comprehensive quota warning tests (6 test cases)
- Update construct_property_rn validation tests
- All tests passing

Breaking Changes:
- property_id parameters now require numeric strings only
- Response formats are more compact (but contain same data)
- Custom dimensions/metrics return different field names (api_name vs apiName)
- run_realtime_report no longer accepts offset parameter
@jradcliff
Copy link
Copy Markdown
Member

Thanks for the pull request! Could you run nox -s format so the checks pass that phase?

https://github.com/googleanalytics/google-analytics-mcp/actions/runs/18111530090/job/51837936584?pr=71#step:5:1

@AdamGustavsson
Copy link
Copy Markdown
Contributor Author

Sorry, I haven´t used this type of formatters before. I hope it will pass now.

@jradcliff
Copy link
Copy Markdown
Member

@AdamGustavsson , apologies for the delay on this review. I haven't forgotten about it and hope to get to it soon!

@matt-landers
Copy link
Copy Markdown
Member

@AdamGustavsson I'm taking over this review. If you can resolve the conflicts, I'll review and get it merged.

Copy link
Copy Markdown

@ZLeventer ZLeventer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a high-value change. MCP tool responses go straight into the LLM context window, and proto_to_dict() on raw GA responses dumps a lot of noise — resource name prefixes (properties/12345), empty fields, redundant parent references. Stripping those down to clean IDs and dropping unused fields directly reduces token consumption per tool call.

Specific improvements I like:

  1. get_account_summaries — Extracting just account_id, account_name, and a clean properties list is much better than dumping the raw protobuf. The raw response includes resource names, display names, internal URIs — most of which the LLM doesn't need for subsequent calls.

  2. property_id type change from int | str to str — Makes the tool interface cleaner. LLMs already pass strings, and the construct_property_rn utility handles the conversion internally anyway.

  3. Default limit: int = 100 — Smart default. Without a limit, a report on a high-cardinality dimension (like pagePath) can return thousands of rows, blowing up the context window. 100 is a reasonable default that covers most analytical queries.

  4. Removing parent from property details — Correct, it's redundant when you already know the property ID.

The docstring reformatting to shorter line widths also helps since these get injected into tool descriptions that the LLM reads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce the token count of get_account_summaries

4 participants