Skip to content

Add mutable bitwise operations to BooleanArray and NullBuffer::union_many#9692

Merged
alamb merged 9 commits intoapache:mainfrom
mbutrovich:union_many
Apr 14, 2026
Merged

Add mutable bitwise operations to BooleanArray and NullBuffer::union_many#9692
alamb merged 9 commits intoapache:mainfrom
mbutrovich:union_many

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich commented Apr 10, 2026

Which issue does this PR close?

Rationale for this change

Several DataFusion PRs (#21464, #21468, #21471, #21475, #21477, #21482, #21532) optimize NULL handling in scalar functions by replacing row-by-row null buffer construction with bulk NullBuffer::union. When 3+ null buffers need combining, they chain binary union calls, each allocating a new BooleanBuffer.

NullBuffer::union_many reduces this to 1 allocation (clone + in-place ANDs). For example, from #21482:

Before:

[array.nulls(), from_array.nulls(), to_array.nulls(), stride.and_then(|s| s.nulls())]
    .into_iter()
    .fold(None, |acc, nulls| NullBuffer::union(acc.as_ref(), nulls))

After:

NullBuffer::union_many([
    array.nulls(),
    from_array.nulls(),
    to_array.nulls(),
    stride.and_then(|s| s.nulls()),
])

Per @alamb's suggestion, this PR also implements the general-purpose mutable bitwise operations on BooleanArray from #8809, following the PrimitiveArray::unary / unary_mut pattern. This builds on the BitAndAssign/BitOrAssign/BitXorAssign operators added to BooleanBuffer in #9567.

What changes are included in this PR?

NullBuffer::union_many(impl IntoIterator<Item = Option<&NullBuffer>>): combines multiple null buffers in a single allocation (clone + in-place &=). Used by DataFusion for bulk null handling.

BooleanArray bitwise operations (6 new public methods):

Unary (op: FnMut(u64) -> u64):

  • bitwise_unary(&self, op) — always allocates a new array
  • bitwise_unary_mut(self, op) -> Result<Self, Self> — in-place if uniquely owned, Err(self) if shared
  • bitwise_unary_mut_or_clone(self, op) — in-place if uniquely owned, allocates if shared

Binary (op: FnMut(u64, u64) -> u64):

  • bitwise_bin_op(&self, rhs, op) — always allocates, unions null buffers
  • bitwise_bin_op_mut(self, rhs, op) -> Result<Self, Self> — in-place if uniquely owned, Err(self) if shared, unions null buffers
  • bitwise_bin_op_mut_or_clone(self, rhs, op) — in-place if uniquely owned, allocates if shared, unions null buffers

Note: #8809 proposed the binary variants take a raw buffer and right_offset_in_bits. This PR takes &BooleanArray instead, which encapsulates both and matches existing patterns like BooleanArray::from_binary.

Are these changes tested?

Yes. 23 tests for the BooleanArray bitwise methods and 6 tests for union_many, covering:

  • Basic correctness (AND, OR, NOT)
  • Null handling (both nullable, one nullable, no nulls, null union)
  • Buffer ownership (uniquely owned → in-place, shared → Err / fallback)
  • Edge cases (empty arrays, sliced arrays with non-zero offset, misaligned left/right offsets)

Are there any user-facing changes?

Six new public methods on BooleanArray and one new public method on NullBuffer.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 10, 2026
@neilconway
Copy link
Copy Markdown

Nice! I noticed this as well, should be a nice win.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 10, 2026

Thnak you @mbutrovich and @neilconway

Instead of a new kernel, would you be wiling to implement this intead?

I think ti would take a little more work, but would be more general

@mbutrovich
Copy link
Copy Markdown
Contributor Author

Thnak you @mbutrovich and @neilconway

Instead of a new kernel, would you be wiling to implement this intead?

I think ti would take a little more work, but would be more general

I'll take a look!

@mbutrovich mbutrovich changed the title Add BooleanBuffer::bitand_many and NullBuffer::union_many Add mutable bitwise operations to BooleanArray and NullBuffer::union_many Apr 10, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 13, 2026

Tank you for this PR -- I plan to review it more carefully today

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @mbutrovich -- this is so nice ❤️

Image

I pushed a small commit to this PR to

  1. add a few other test cases
  2. verified that buffers are actually the same/different

Comment thread arrow-buffer/src/buffer/null.rs Outdated
/// Computes the union of the nulls in multiple optional [`NullBuffer`]s
///
/// See [`union`](Self::union)
pub fn union_many(nulls: &[Option<&NullBuffer>]) -> Option<NullBuffer> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be more generic if we had it take the iterator so it wouldn't require a slice

    pub fn union_many(nulls: impl IntoIterator<Item=Option<&NullBuffer>)) -> Option<NullBuffer> {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC I tried this at one point, but the current design seemed more idiomatic to the codebase. Happy to go with whatever makes sense to you.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make the change real fast.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think a slice is fine

Comment thread arrow-buffer/src/buffer/null.rs
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 13, 2026

I should probably note that the BooleanArray API additions are in addition to NullBuffer:union_many -- they are not required to be in the same PR

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 13, 2026

(I am not asking for changes here I am just observing)

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thanks again @mbutrovich

@alamb alamb merged commit bfee844 into apache:main Apr 14, 2026
26 checks passed
@mbutrovich mbutrovich deleted the union_many branch April 14, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Arrow] Add bitwise operations BooleanArray that potentially reuse the underlying allocation

3 participants