Add mutable bitwise operations to BooleanArray and NullBuffer::union_many#9692
Add mutable bitwise operations to BooleanArray and NullBuffer::union_many#9692alamb merged 9 commits intoapache:mainfrom
BooleanArray and NullBuffer::union_many#9692Conversation
|
Nice! I noticed this as well, should be a nice win. |
|
Thnak you @mbutrovich and @neilconway Instead of a new kernel, would you be wiling to implement this intead? I think ti would take a little more work, but would be more general |
I'll take a look! |
…anBuffer's bitand_many and associated tests.
BooleanBuffer::bitand_many and NullBuffer::union_manyBooleanArray and NullBuffer::union_many
|
Tank you for this PR -- I plan to review it more carefully today |
alamb
left a comment
There was a problem hiding this comment.
Thank you so much @mbutrovich -- this is so nice ❤️
I pushed a small commit to this PR to
- add a few other test cases
- verified that buffers are actually the same/different
| /// Computes the union of the nulls in multiple optional [`NullBuffer`]s | ||
| /// | ||
| /// See [`union`](Self::union) | ||
| pub fn union_many(nulls: &[Option<&NullBuffer>]) -> Option<NullBuffer> { |
There was a problem hiding this comment.
This might be more generic if we had it take the iterator so it wouldn't require a slice
pub fn union_many(nulls: impl IntoIterator<Item=Option<&NullBuffer>)) -> Option<NullBuffer> {There was a problem hiding this comment.
IIRC I tried this at one point, but the current design seemed more idiomatic to the codebase. Happy to go with whatever makes sense to you.
There was a problem hiding this comment.
I'll make the change real fast.
There was a problem hiding this comment.
I also think a slice is fine
|
I should probably note that the BooleanArray API additions are in addition to NullBuffer:union_many -- they are not required to be in the same PR |
|
(I am not asking for changes here I am just observing) |
alamb
left a comment
There was a problem hiding this comment.
Looks great -- thanks again @mbutrovich

Which issue does this PR close?
BooleanArraythat potentially reuse the underlying allocation #8809.Rationale for this change
Several DataFusion PRs (#21464, #21468, #21471, #21475, #21477, #21482, #21532) optimize NULL handling in scalar functions by replacing row-by-row null buffer construction with bulk
NullBuffer::union. When 3+ null buffers need combining, they chain binaryunioncalls, each allocating a newBooleanBuffer.NullBuffer::union_manyreduces this to 1 allocation (clone + in-place ANDs). For example, from #21482:Before:
After:
Per @alamb's suggestion, this PR also implements the general-purpose mutable bitwise operations on
BooleanArrayfrom #8809, following thePrimitiveArray::unary/unary_mutpattern. This builds on theBitAndAssign/BitOrAssign/BitXorAssignoperators added toBooleanBufferin #9567.What changes are included in this PR?
NullBuffer::union_many(impl IntoIterator<Item = Option<&NullBuffer>>): combines multiple null buffers in a single allocation (clone + in-place&=). Used by DataFusion for bulk null handling.BooleanArraybitwise operations (6 new public methods):Unary (
op: FnMut(u64) -> u64):bitwise_unary(&self, op)— always allocates a new arraybitwise_unary_mut(self, op) -> Result<Self, Self>— in-place if uniquely owned,Err(self)if sharedbitwise_unary_mut_or_clone(self, op)— in-place if uniquely owned, allocates if sharedBinary (
op: FnMut(u64, u64) -> u64):bitwise_bin_op(&self, rhs, op)— always allocates, unions null buffersbitwise_bin_op_mut(self, rhs, op) -> Result<Self, Self>— in-place if uniquely owned,Err(self)if shared, unions null buffersbitwise_bin_op_mut_or_clone(self, rhs, op)— in-place if uniquely owned, allocates if shared, unions null buffersNote: #8809 proposed the binary variants take a raw buffer and
right_offset_in_bits. This PR takes&BooleanArrayinstead, which encapsulates both and matches existing patterns likeBooleanArray::from_binary.Are these changes tested?
Yes. 23 tests for the
BooleanArraybitwise methods and 6 tests forunion_many, covering:Err/ fallback)Are there any user-facing changes?
Six new public methods on
BooleanArrayand one new public method onNullBuffer.