Arrow IPC binary fetch path for DataFrame execution by Martozar · Pull Request #1489 · gooddata/gooddata-python-sdk

Martozar · 2026-03-30T08:02:12Z

Summary

Adds a native Arrow IPC binary fetch path to gooddata-pandas, providing a faster alternative to the existing JSON-paged AFM path for large result sets.

What changed

gooddata-sdk — binary fetch

BareExecutionResponse.read_result_arrow() fetches execution results from the server's binary IPC endpoint and returns a pyarrow.Table.

gooddata-pandas — Arrow→DataFrame conversion

DataFrameFactory.for_exec_def_arrow() — new public method that mirrors for_exec_def() but uses the binary path.
for_arrow_table() — pure conversion from pa.Table to (pd.DataFrame, DataFrameMetadata), enabling callers to bring their own Arrow data.
convert_arrow_table_to_dataframe() — low-level converter that reconstructs row/column MultiIndex, subtotals, primary labels, and types from Arrow field metadata.

Why

The JSON paging path serialises every result to JSON and pages it in chunks — it is CPU-heavy and slow for wide or deep result sets. Arrow IPC transfers binary columnar
data in a single round-trip. End-to-end benchmarks against the GoodData demo workspace show 1.3×–33× speedup depending on table shape, with larger tables benefiting most .

Test coverage

140 unit tests covering: missing metadata keys (all three required keys), self_destruct mode, _build_field_index edge cases (subtotal padding, asymmetric depth), compute_row_totals_indexes with empty dimensions, for_arrow_table correctness across flat/transposed/subtotals/both-dim-totals cases.
47 ground-truth fixture cases generated against the live API and committed to tests/dataframe/fixtures/arrow/, including 3-metric tables, 3-level nested subtotals, multi-aggregation multi-metric tables, and asymmetric totals (different levels/aggregations per metric).
IPC test fixture updated to use ipc.new_file to match the server format.

codecov · 2026-03-30T09:31:36Z

Codecov Report

❌ Patch coverage is 91.55405% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.38%. Comparing base (49ea0d5) to head (7dddd3a).
⚠️ Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
...s/gooddata-pandas/src/gooddata_pandas/dataframe.py	46.42%	15 Missing ⚠️
...ta-sdk/src/gooddata_sdk/compute/model/execution.py	76.47%	4 Missing ⚠️
...es/gooddata-pandas/src/gooddata_pandas/__init__.py	50.00%	3 Missing ⚠️
...data-pandas/src/gooddata_pandas/arrow_convertor.py	99.12%	2 Missing ⚠️
...ddata-sdk/src/gooddata_sdk/compute/model/filter.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1489      +/-   ##
==========================================
+ Coverage   78.13%   78.38%   +0.25%     
==========================================
  Files         228      230       +2     
  Lines       14926    15212     +286     
==========================================
+ Hits        11662    11924     +262     
- Misses       3264     3288      +24

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hkad98 · 2026-03-30T11:49:44Z

packages/gooddata-sdk/pyproject.toml

 ]

+[project.optional-dependencies]
+arrow = ["pyarrow>=16.1.0"]


Nitpick: this is a new dependency. Consider setting the threshold higher e.g., pyarrow>=23.0.1

packages/gooddata-pandas/src/gooddata_pandas/dataframe.py

packages/gooddata-sdk/src/gooddata_sdk/catalog/export/service.py

packages/gooddata-sdk/src/gooddata_sdk/compute/model/execution.py

…arrow

…w result reading

Switch read_result_arrow to explicitly request application/vnd.apache.arrow.stream via Accept header and pipe the HTTP response directly into ipc.open_stream(), eliminating the intermediate BytesIO buffer. Update tests accordingly.

Martozar requested review from hkad98, jaceksan, lupko and pcerny as code owners March 30, 2026 08:02

Martozar force-pushed the c.mze-cq-105 branch 3 times, most recently from 7453528 to 0380d40 Compare March 30, 2026 09:22

Martozar marked this pull request as draft March 30, 2026 10:47

hkad98 reviewed Mar 30, 2026

View reviewed changes

no23reason reviewed Mar 31, 2026

View reviewed changes

packages/gooddata-pandas/src/gooddata_pandas/dataframe.py Show resolved Hide resolved

Martozar changed the title ~~C.mze cq 105~~ Arrow IPC binary fetch path for DataFrame execution Apr 1, 2026

Martozar marked this pull request as ready for review April 1, 2026 11:01

no23reason reviewed Apr 1, 2026

View reviewed changes

packages/gooddata-sdk/src/gooddata_sdk/catalog/export/service.py Show resolved Hide resolved

no23reason reviewed Apr 1, 2026

View reviewed changes

packages/gooddata-sdk/src/gooddata_sdk/compute/model/execution.py Outdated Show resolved Hide resolved

Martozar force-pushed the c.mze-cq-105 branch 2 times, most recently from d7fbc76 to 4e99271 Compare April 1, 2026 13:54

Martozar added 4 commits April 1, 2026 15:57

feat(gooddata-pandas): add Arrow IPC execution path via for_exec_def_…

b7d2744

…arrow

fix(gooddata-sdk): update type annotations for ty 0.0.27 and fix Arro…

6a20f1c

…w result reading

docs(export): fix get_raw_export_bytes docstring to be format-agnostic

7dddd3a

Martozar force-pushed the c.mze-cq-105 branch from 4e99271 to 7dddd3a Compare April 1, 2026 14:00

no23reason approved these changes Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arrow IPC binary fetch path for DataFrame execution#1489

Arrow IPC binary fetch path for DataFrame execution#1489
Martozar wants to merge 4 commits intogooddata:masterfrom
Martozar:c.mze-cq-105

Martozar commented Mar 30, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

hkad98 Mar 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Martozar commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hkad98 Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Martozar commented Mar 30, 2026 •

edited

Loading

codecov bot commented Mar 30, 2026 •

edited

Loading