GH-48277: [C++][Parquet] unpack with shuffle algorithm by AntoinePrv · Pull Request #47994 · apache/arrow

AntoinePrv · 2025-10-29T14:23:23Z

Rationale for this change

The current bit-unpacking algorithm (which is implemented as a C++ code generator script in Python) does not fully leverage SIMD operations: all loads and some bitshifts use scalar operations, leaving performance on the table.

What changes are included in this PR?

Devise new bit-unpacking algorithms that fully leverage SIMD operations, for various parameter values of (packed bit width, destination integer width, SIMD register size). Different algorithms are necessary for different parameter values, because of straddling issues with some bit offsets.
Implement these new algorithms entirely in C++ using metaprogramming: the tables necessary for efficient SIMD swizzling and shifting are computed at compile-time using constexpr code (the exception is AVX-512 which is still using the legacy Python code generation script).
Implement low-level generic fallbacks for SIMD operations that are not available in all SIMD instruction sets, such as some flavor of bit-shifting. These fallbacks are also being contributed to xsimd, but have not been merged there yet.

Benchmark results on a AVX2 CPU (AMD Zen 2) on Linux (Ubuntu 24.04):

Parquet decoding:
- between 30% and 60% faster on DELTA_BINARY_PACKED integer decoding
- up to 25% faster on RLE boolean decoding
Parquet reading:
- around 60% faster on BIT_PACKED levels reading
- up to 70% faster on RLE levels reading
- up to 20% faster on reading or skipping OPTIONAL and REPEATED records

Are these changes tested?

Yes, by the current extensive bit-unpacking and Parquet decoding tests.

Are there any user-facing changes?

No

GitHub Issue: [C++][Parquet] Better simd unpack algorithm #48277

github-actions · 2025-10-29T14:23:48Z

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

github-actions · 2025-11-27T13:49:50Z

⚠️ GitHub issue #48277 has been automatically assigned in GitHub to PR creator.

AntoinePrv · 2025-11-27T18:00:18Z

@pitrou apart from R-lint, this is looking pretty good.

pitrou · 2025-11-27T18:27:05Z

@ursabot please benchmark lang=C++

voltrondatabot · 2025-11-27T18:27:11Z

Benchmark runs are scheduled for commit a4bfe8a. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

conbench-apache-arrow · 2025-11-27T20:52:09Z

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit a4bfe8a.

There were 37 benchmark results indicating a performance regression:

Pull Request Run on amd64-c6a-4xlarge-linux at 2025-11-27 19:32:37Z
- BM_UnpackUint64 (C++) with params=DynamicAligned/47/64, source=cpp-micro, suite=arrow-bpacking-benchmark
- IsInInt64SmallSet (C++) with params=64, source=cpp-micro, suite=arrow-compute-scalar-set-lookup-benchmark
and 35 more (see the report linked below)

The full Conbench report has more details.

AntoinePrv · 2025-11-28T09:31:59Z

@pitrou I'm running this locally, and I made an error when fixing ASAN over-reading problem.
These latest benchmarks are not doing well.

pitrou · 2025-11-28T14:05:50Z

@ursabot please benchmark lang=C++

voltrondatabot · 2025-11-28T14:05:56Z

Benchmark runs are scheduled for commit dd3ec0d. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

conbench-apache-arrow · 2025-11-28T18:08:56Z

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit dd3ec0d.

There were 19 benchmark results indicating a performance regression:

Pull Request Run on amd64-c6a-4xlarge-linux at 2025-11-28 15:12:47Z
- BM_UnpackUint32 (C++) with params=DynamicUnaligned/20/64, source=cpp-micro, suite=arrow-bpacking-benchmark
- BM_DeltaLengthDecodingByteArray (C++) with params=max-string-length:8/batch-size:2048, source=cpp-micro, suite=parquet-encoding-benchmark
and 17 more (see the report linked below)

The full Conbench report has more details.

pitrou · 2025-12-01T15:57:57Z

@ursabot please benchmark lang=C++

voltrondatabot · 2025-12-01T15:58:04Z

Benchmark runs are scheduled for commit 408ef04. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

pitrou · 2025-12-02T15:07:06Z

@ursabot please benchmark lang=C++

conbench-apache-arrow · 2025-12-10T19:17:07Z

Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit 408ef04.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

conbench-apache-arrow · 2025-12-13T15:40:26Z

Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit 408ef04.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

pitrou · 2026-02-17T14:09:02Z

@github-actions crossbow submit -g cpp

github-actions · 2026-02-17T14:11:30Z

Revision: b638570

Submitted crossbow builds: ursacomputing/crossbow @ actions-158ce97c66

Task	Status
example-cpp-minimal-build-static
example-cpp-minimal-build-static-system-dependency
example-cpp-tutorial
test-build-cpp-fuzz
test-conda-cpp
test-conda-cpp-valgrind
test-debian-13-cpp-amd64
test-debian-13-cpp-i386
test-debian-experimental-cpp-gcc-15
test-fedora-42-cpp
test-ubuntu-22.04-cpp
test-ubuntu-22.04-cpp-20
test-ubuntu-22.04-cpp-bundled
test-ubuntu-22.04-cpp-emscripten
test-ubuntu-22.04-cpp-no-threading
test-ubuntu-24.04-cpp
test-ubuntu-24.04-cpp-bundled-offline
test-ubuntu-24.04-cpp-gcc-13-bundled
test-ubuntu-24.04-cpp-gcc-14
test-ubuntu-24.04-cpp-minimal-with-formats
test-ubuntu-24.04-cpp-thread-sanitizer

pitrou · 2026-02-17T16:47:45Z

Unfortunately the "AMD64 Windows R release" failure looks related to this PR: it's deterministic (I've restarted it twice), it doesn't occur on git main, and the test where it fails/crashes uses Parquet.

arrow/r/tests/testthat/test-dplyr-summarize.R

Lines 260 to 271 in ebaaf07

    
           test_that("n_distinct() with many batches", { 
        
             skip_if_not_available("parquet") 
        
             tf <- tempfile() 
        
             write_parquet(dplyr::starwars, tf, chunk_size = 20) 
        
             ds <- open_dataset(tf) 
        
             expect_equal( 
        
               ds |> summarise(n_distinct(sex, na.rm = FALSE)) |> collect(), 
        
               ds |> collect() |> summarise(n_distinct(sex, na.rm = FALSE)) 
        
             ) 
        
           })

pitrou · 2026-02-17T16:50:56Z

@jonkeane How would you go about diagnosing a R test that crashes/stops with no error message on Windows (see above)?

pitrou · 2026-02-17T18:06:48Z

@ursabot please benchmark lang=C++

rok · 2026-02-17T18:06:55Z

Benchmark runs are scheduled for commit b638570. Watch https://buildkite.com/apache-arrow and https://conbench.arrow-dev.org for updates. A comment will be posted here when the runs are complete.

github-actions bot added Component: C++ awaiting review Awaiting review labels Oct 29, 2025

AntoinePrv force-pushed the new-bpacking branch 4 times, most recently from d2743d4 to 6e72467 Compare October 30, 2025 15:58

AntoinePrv force-pushed the new-bpacking branch 2 times, most recently from a7e4cd9 to 9efa59a Compare November 20, 2025 17:19

AntoinePrv force-pushed the new-bpacking branch 2 times, most recently from d01fdba to b28ea9b Compare November 27, 2025 09:57

AntoinePrv changed the title ~~unpack with shuffle algorithm~~ GH-48277: [C++][Parquet] unpack with shuffle algorithm Nov 27, 2025

AntoinePrv force-pushed the new-bpacking branch from f546ed9 to 4f9fbe1 Compare November 27, 2025 14:03

AntoinePrv marked this pull request as ready for review November 27, 2025 14:03

AntoinePrv force-pushed the new-bpacking branch from a4bfe8a to dd3ec0d Compare November 28, 2025 10:46

pitrou mentioned this pull request Dec 17, 2025

GH-45847: [C++] Optimize Parquet column reader by fusing decoding and counting #48549

Open

AntoinePrv and others added 22 commits February 17, 2026 15:05

Fix pessimit overeading guard

3e86901

Fix overreading guard comparison

6a61a87

Add UnpackOptions and max_read_bytes

5a000fc

Use C++20 NTTP

e6e097a

xsimd 14.0 compatibility

743577f

fmt

cecd14f

C++20 NTTP options

c0ee9d5

Homogenous wording

1cff8bc

Remove xsimd backward compatibility

57f278b

Apply doc fixes from code review

43c8694

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Documentation and code improvements

251437e

Move utilities into bpacking sub ns

5ed131a

Refactor plan builders

f64ad1d

Move utilities

8ea86be

Kernel documentation

e68f936

adjust_bytes_per_read doc

3c27968

Fewer typename

d050ee1

Add documentation

1329b39

Fix bounds in plan builders

19a32e3

Change names

b431b19

Add extra comments

22dff86

Fix comments

b638570

pitrou force-pushed the new-bpacking branch from 97d87e8 to b638570 Compare February 17, 2026 14:07

pitrou added the CI: Extra: C++ Run extra C++ CI label Feb 17, 2026

Conversation

AntoinePrv commented Oct 29, 2025 • edited by pitrou Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Benchmark results on a AVX2 CPU (AMD Zen 2) on Linux (Ubuntu 24.04):

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

AntoinePrv commented Nov 27, 2025

Uh oh!

pitrou commented Nov 27, 2025

Uh oh!

voltrondatabot commented Nov 27, 2025

Uh oh!

conbench-apache-arrow bot commented Nov 27, 2025

Uh oh!

AntoinePrv commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou commented Nov 28, 2025

Uh oh!

voltrondatabot commented Nov 28, 2025

Uh oh!

conbench-apache-arrow bot commented Nov 28, 2025

Uh oh!

pitrou commented Dec 1, 2025

Uh oh!

voltrondatabot commented Dec 1, 2025

Uh oh!

pitrou commented Dec 2, 2025

Uh oh!

conbench-apache-arrow bot commented Dec 10, 2025

Uh oh!

conbench-apache-arrow bot commented Dec 13, 2025

Uh oh!

pitrou commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

pitrou commented Feb 17, 2026

Uh oh!

pitrou commented Feb 17, 2026

Uh oh!

pitrou commented Feb 17, 2026

Uh oh!

rok commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AntoinePrv commented Oct 29, 2025 •

edited by pitrou

Loading

AntoinePrv commented Nov 28, 2025 •

edited

Loading