HBASE-29889 Add XXH3 Hash Support to Bloom Filter by jinhyukify · Pull Request #7740 · apache/hbase

jinhyukify · 2026-02-11T18:14:57Z

Jira https://issues.apache.org/jira/browse/HBASE-29889

This PR adds XXH3 as a new Bloom filter hash type. XXH3 is designed for modern CPU architectures and shows clearly better performance than the existing Jenkins/Murmur/Murmur3 hashes used today.

Benchmark results and brief implementation notes can be found here:
Benchmark and Design Notes (Google Doc)

Benchmark test code here: jinhyukify/xxh3-benchmark

… hashing

jinhyukify · 2026-02-11T18:17:57Z

hbase-common/src/main/java/org/apache/hadoop/hbase/util/XXH3.java

+    (byte) 0x8f, (byte) 0x95, (byte) 0x16, (byte) 0x04, (byte) 0x28, (byte) 0xaf, (byte) 0xd7,
+    (byte) 0xfb, (byte) 0xca, (byte) 0xbb, (byte) 0x4b, (byte) 0x40, (byte) 0x7e, };
+
+  // Pre-converted longs from DefaultSecret to avoid reconstruction at runtime


This matches the little-endian value we get from reading the default secret bytes as-is.
Pre-computing it as a long like this gave a small performance bump when I tested.

This loads an additional set of 37 long values as static fields.
There’s some overhead from keeping these statically initialized values around, but the performance gains make it worthwhile.

jinhyukify · 2026-02-11T18:24:47Z

hbase-common/src/main/java/org/apache/hadoop/hbase/util/RowColBloomHashKey.java

+    // Optimization: when the offset points to the last 8 bytes,
+    // we can return the precomputed trailing long value directly.
+    if (offset + Bytes.SIZEOF_LONG == totalLength) {
+      return LAST_8_BYTES;


This approach gave better performance in the 9–16 byte hash path.

jinhyukify · 2026-02-11T18:26:38Z

hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CompoundBloomFilter.java

+    sb.append(BloomFilterUtil.STATS_RECORD_SEP + "Hash type: " + hashType)
+      .append(" (" + Optional.ofNullable(Hash.getInstance(hashType))
+        .map(i -> i.getClass().getSimpleName()).orElse("UNKNOWN") + ")");


I'd like to show the Bloom filter hash type in HFilePrettyPrinter.

jinhyukify · 2026-02-11T18:30:48Z

hbase-server/src/main/java/org/apache/hadoop/hbase/util/BloomFilterUtil.java

+   * @param key  the hash key
+   * @return a pair of hash values (hash1, hash2)
+   */
+  public static Pair<Integer, Integer> getHashPair(Hash hash, HashKey<?> key) {


This part gave me a bit of trouble.
I put quite a lot of work into making the XXH3 implementation zero-heap, but ironically ended up adding a small object allocation in the hash calculation path.

The main reason was that the Bloom filter hash-location logic was split across two different classes, so I consolidated it into one place. This contains path is pretty hot, so I hesitated a bit, but given recent GC algorithm performance it might be acceptable.

Still, I'd love to hear your thoughts on this trade-off.

jinhyukify · 2026-02-11T18:32:18Z

hbase-server/src/main/java/org/apache/hadoop/hbase/util/BloomFilterUtil.java

+      int hash1 = (int) hash64;
+      int hash2 = (int) (hash64 >>> 32);


A well-designed hash function should behave reliably even if we take either half of its 64-bit output as a 32-bit value. XXH3 is no exception.
I'm adding the XXH3 author’s comment here for reference.
Cyan4973/xxHash#453 (comment)

jinhyukify · 2026-02-11T18:37:57Z

hbase-common/src/main/java/org/apache/hadoop/hbase/util/Hash64.java

+   * @param seed    the 64-bit seed value
+   * @return the computed 64-bit hash value
+   */
+  <T> long hash64(HashKey<T> hashKey, long seed);


The goal here is to take a single 64-bit hash result and split it into two 32-bit hashes to compute the Bloom hash locations.

-------------- 64-bit hash output -------------- | 64 bits | ------------------------------------------------ | lower 32 bits (hash1) | | upper 32 bits (hash2) | ------------------------------------------------

Since XXH3 already performs much better than the existing hashes and we no longer need to run the hash function twice, this approach gives us an additional performance win on top of the baseline speedup.

jinhyukify · 2026-02-12T01:48:31Z

hbase-common/src/main/java/org/apache/hadoop/hbase/util/XXH3.java

+ */
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+public class XXH3 extends Hash implements Hash64 {


I mostly followed the algorithm as described here
xxh3-algorithm-overview

Also referenced the original implementation.
https://xxhash.com/doc/v0.8.2/xxhash_8h_source.html

jinhyukify · 2026-02-13T11:49:53Z

hbase-common/src/main/java/org/apache/hadoop/hbase/util/XXH3.java

+  }
+
+  private static long mul128AndFold64(long x, long y) {
+    // Consider switching to Math.unsignedMultiplyHigh(x, y) when we can drop Java 8.


https://github.com/openjdk/jdk/blob/jdk-21%2B35/src/java.base/share/classes/java/lang/Math.java#L1399-L1414
Using Math.multiplyHigh(>= jdk 9) or Math.unsignedMultiplyHigh (>= jdk 18) showed better benchmark performance because the JIT compiler replaces them with optimized CPU instructions (intrinsics).

Apache9 · 2026-02-14T06:25:07Z

We need to implement this algorithm by ourselves? No existing libraries available?

Apache9 · 2026-02-14T06:28:10Z

Like this one?

https://github.com/OpenHFT/Zero-Allocation-Hashing

jinhyukify · 2026-02-14T08:40:40Z

@Apache9 Thank you for reviewing this first. 😄

I first evaluated Zero-Allocation-Hashing and hash4j.
The performance results were as follows:

There are several reasons why I decided not to adopt these libraries.

Zero-Allocation-Hashing (ZAH)

HBase provides fallback logic for Unsafe access and therefore does not strictly depend on Unsafe. However, Zero-Allocation-Hashing relies on Unsafe when reading the secret.
https://github.com/OpenHFT/Zero-Allocation-Hashing/blob/ea/src/main/java/net/openhft/hashing/XXH3.java#L17

And HBase implementation showed slightly better performance. I believe this comes from an optimization where I pre-decode the input bytes into long values (precomputed long loads)

Hash4j

hash4j shows better performance starting from medium input lengths, but it only supports JDK 11 and above. The observed performance difference appears to come from intrinsic optimizations of functions that were introduced in Java 9 and later.
https://github.com/dynatrace-oss/hash4j/blob/main/src/main/java/com/dynatrace/hash4j/internal/UnsignedMultiplyUtil.java#L36

In my personal view, hash function implementations typically do not require continuous maintenance. Therefore, maintaining our own implementation should not be considered a significant drawback. WDYT?

jinhyukify added 2 commits February 12, 2026 03:06

HBASE-29889 Add LittleEndianBytes utility for fast LE primitive access

31733b1

HBASE-29889 Extend HashKey with bulk little-endian accessors for fast…

e6a302e

… hashing

jinhyukify force-pushed the HBASE-29889 branch from 3453eaa to b7411bd Compare February 11, 2026 18:24

jinhyukify added 3 commits February 12, 2026 03:41

HBASE-29889 Implement XXH3 64bit hashing

fbdc25c

HBASE-29889 Add 64bit Bloom filter hash support

9281600

HBASE-29889 Add XXH3 to Bloom filter hashing

db169a0

jinhyukify force-pushed the HBASE-29889 branch from b7411bd to db169a0 Compare February 11, 2026 18:41

jinhyukify commented Feb 11, 2026

View reviewed changes

jinhyukify commented Feb 12, 2026

View reviewed changes

jinhyukify commented Feb 13, 2026

View reviewed changes

HBASE-29889 Fix CI failure

53f0315

jinhyukify force-pushed the HBASE-29889 branch from cba1a3f to 53f0315 Compare February 13, 2026 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBASE-29889 Add XXH3 Hash Support to Bloom Filter#7740

HBASE-29889 Add XXH3 Hash Support to Bloom Filter#7740
jinhyukify wants to merge 6 commits intoapache:masterfrom
jinhyukify:HBASE-29889

jinhyukify commented Feb 11, 2026 •

edited

Loading

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 12, 2026

Uh oh!

jinhyukify Feb 13, 2026

Uh oh!

Apache9 commented Feb 14, 2026

Uh oh!

Apache9 commented Feb 14, 2026

Uh oh!

jinhyukify commented Feb 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jinhyukify commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Apache9 commented Feb 14, 2026

Uh oh!

Apache9 commented Feb 14, 2026

Uh oh!

jinhyukify commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Zero-Allocation-Hashing (ZAH)

Hash4j

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jinhyukify commented Feb 11, 2026 •

edited

Loading

jinhyukify commented Feb 14, 2026 •

edited

Loading