Skip to content

GH-48277: [C++][Parquet] unpack with shuffle algorithm#47994

Open
AntoinePrv wants to merge 81 commits intoapache:mainfrom
AntoinePrv:new-bpacking
Open

GH-48277: [C++][Parquet] unpack with shuffle algorithm#47994
AntoinePrv wants to merge 81 commits intoapache:mainfrom
AntoinePrv:new-bpacking

Conversation

@AntoinePrv
Copy link
Contributor

@AntoinePrv AntoinePrv commented Oct 29, 2025

Rationale for this change

The current bit-unpacking algorithm (which is implemented as a C++ code generator script in Python) does not fully leverage SIMD operations: all loads and some bitshifts use scalar operations, leaving performance on the table.

What changes are included in this PR?

  1. Devise new bit-unpacking algorithms that fully leverage SIMD operations, for various parameter values of (packed bit width, destination integer width, SIMD register size). Different algorithms are necessary for different parameter values, because of straddling issues with some bit offsets.

  2. Implement these new algorithms entirely in C++ using metaprogramming: the tables necessary for efficient SIMD swizzling and shifting are computed at compile-time using constexpr code (the exception is AVX-512 which is still using the legacy Python code generation script).

  3. Implement low-level generic fallbacks for SIMD operations that are not available in all SIMD instruction sets, such as some flavor of bit-shifting. These fallbacks are also being contributed to xsimd, but have not been merged there yet.

Benchmark results on a AVX2 CPU (AMD Zen 2) on Linux (Ubuntu 24.04):

  • Parquet decoding:

    • between 30% and 60% faster on DELTA_BINARY_PACKED integer decoding
    • up to 25% faster on RLE boolean decoding
  • Parquet reading:

    • around 60% faster on BIT_PACKED levels reading
    • up to 70% faster on RLE levels reading
    • up to 20% faster on reading or skipping OPTIONAL and REPEATED records

Are these changes tested?

Yes, by the current extensive bit-unpacking and Parquet decoding tests.

Are there any user-facing changes?

No

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@AntoinePrv AntoinePrv force-pushed the new-bpacking branch 4 times, most recently from d2743d4 to 6e72467 Compare October 30, 2025 15:58
@AntoinePrv AntoinePrv force-pushed the new-bpacking branch 2 times, most recently from a7e4cd9 to 9efa59a Compare November 20, 2025 17:19
@AntoinePrv AntoinePrv force-pushed the new-bpacking branch 2 times, most recently from d01fdba to b28ea9b Compare November 27, 2025 09:57
@AntoinePrv AntoinePrv changed the title unpack with shuffle algorithm GH-48277: [C++][Parquet] unpack with shuffle algorithm Nov 27, 2025
@github-actions
Copy link

⚠️ GitHub issue #48277 has been automatically assigned in GitHub to PR creator.

@AntoinePrv AntoinePrv marked this pull request as ready for review November 27, 2025 14:03
@AntoinePrv
Copy link
Contributor Author

@pitrou apart from R-lint, this is looking pretty good.

@pitrou
Copy link
Member

pitrou commented Nov 27, 2025

@ursabot please benchmark lang=C++

@voltrondatabot
Copy link

Benchmark runs are scheduled for commit a4bfe8a. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@conbench-apache-arrow
Copy link

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit a4bfe8a.

There were 37 benchmark results indicating a performance regression:

The full Conbench report has more details.

@AntoinePrv
Copy link
Contributor Author

AntoinePrv commented Nov 28, 2025

@pitrou I'm running this locally, and I made an error when fixing ASAN over-reading problem.
These latest benchmarks are not doing well.

@pitrou
Copy link
Member

pitrou commented Nov 28, 2025

@ursabot please benchmark lang=C++

@voltrondatabot
Copy link

Benchmark runs are scheduled for commit dd3ec0d. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@conbench-apache-arrow
Copy link

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit dd3ec0d.

There were 19 benchmark results indicating a performance regression:

The full Conbench report has more details.

@pitrou
Copy link
Member

pitrou commented Dec 1, 2025

@ursabot please benchmark lang=C++

@voltrondatabot
Copy link

Benchmark runs are scheduled for commit 408ef04. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@pitrou
Copy link
Member

pitrou commented Dec 2, 2025

@ursabot please benchmark lang=C++

@conbench-apache-arrow
Copy link

Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit 408ef04.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

@conbench-apache-arrow
Copy link

Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit 408ef04.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

@pitrou
Copy link
Member

pitrou commented Feb 17, 2026

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: b638570

Submitted crossbow builds: ursacomputing/crossbow @ actions-158ce97c66

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-debian-13-cpp-amd64 GitHub Actions
test-debian-13-cpp-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou pitrou added the CI: Extra: C++ Run extra C++ CI label Feb 17, 2026
@pitrou
Copy link
Member

pitrou commented Feb 17, 2026

Unfortunately the "AMD64 Windows R release" failure looks related to this PR: it's deterministic (I've restarted it twice), it doesn't occur on git main, and the test where it fails/crashes uses Parquet.

test_that("n_distinct() with many batches", {
skip_if_not_available("parquet")
tf <- tempfile()
write_parquet(dplyr::starwars, tf, chunk_size = 20)
ds <- open_dataset(tf)
expect_equal(
ds |> summarise(n_distinct(sex, na.rm = FALSE)) |> collect(),
ds |> collect() |> summarise(n_distinct(sex, na.rm = FALSE))
)
})

@pitrou
Copy link
Member

pitrou commented Feb 17, 2026

@jonkeane How would you go about diagnosing a R test that crashes/stops with no error message on Windows (see above)?

@pitrou
Copy link
Member

pitrou commented Feb 17, 2026

@ursabot please benchmark lang=C++

@rok
Copy link
Member

rok commented Feb 17, 2026

Benchmark runs are scheduled for commit b638570. Watch https://buildkite.com/apache-arrow and https://conbench.arrow-dev.org for updates. A comment will be posted here when the runs are complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting committer review Awaiting committer review CI: Extra: C++ Run extra C++ CI Component: C++

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants