HBASE-29889 Add XXH3 Hash Support to Bloom Filter#7740
HBASE-29889 Add XXH3 Hash Support to Bloom Filter#7740jinhyukify wants to merge 6 commits intoapache:masterfrom
Conversation
3453eaa to
b7411bd
Compare
b7411bd to
db169a0
Compare
| (byte) 0x8f, (byte) 0x95, (byte) 0x16, (byte) 0x04, (byte) 0x28, (byte) 0xaf, (byte) 0xd7, | ||
| (byte) 0xfb, (byte) 0xca, (byte) 0xbb, (byte) 0x4b, (byte) 0x40, (byte) 0x7e, }; | ||
|
|
||
| // Pre-converted longs from DefaultSecret to avoid reconstruction at runtime |
There was a problem hiding this comment.
This matches the little-endian value we get from reading the default secret bytes as-is.
Pre-computing it as a long like this gave a small performance bump when I tested.
There was a problem hiding this comment.
This loads an additional set of 37 long values as static fields.
There’s some overhead from keeping these statically initialized values around, but the performance gains make it worthwhile.
| // Optimization: when the offset points to the last 8 bytes, | ||
| // we can return the precomputed trailing long value directly. | ||
| if (offset + Bytes.SIZEOF_LONG == totalLength) { | ||
| return LAST_8_BYTES; |
There was a problem hiding this comment.
This approach gave better performance in the 9–16 byte hash path.
| sb.append(BloomFilterUtil.STATS_RECORD_SEP + "Hash type: " + hashType) | ||
| .append(" (" + Optional.ofNullable(Hash.getInstance(hashType)) | ||
| .map(i -> i.getClass().getSimpleName()).orElse("UNKNOWN") + ")"); |
There was a problem hiding this comment.
I'd like to show the Bloom filter hash type in HFilePrettyPrinter.
| * @param key the hash key | ||
| * @return a pair of hash values (hash1, hash2) | ||
| */ | ||
| public static Pair<Integer, Integer> getHashPair(Hash hash, HashKey<?> key) { |
There was a problem hiding this comment.
This part gave me a bit of trouble.
I put quite a lot of work into making the XXH3 implementation zero-heap, but ironically ended up adding a small object allocation in the hash calculation path.
The main reason was that the Bloom filter hash-location logic was split across two different classes, so I consolidated it into one place. This contains path is pretty hot, so I hesitated a bit, but given recent GC algorithm performance it might be acceptable.
Still, I'd love to hear your thoughts on this trade-off.
| int hash1 = (int) hash64; | ||
| int hash2 = (int) (hash64 >>> 32); |
There was a problem hiding this comment.
A well-designed hash function should behave reliably even if we take either half of its 64-bit output as a 32-bit value. XXH3 is no exception.
I'm adding the XXH3 author’s comment here for reference.
Cyan4973/xxHash#453 (comment)
| * @param seed the 64-bit seed value | ||
| * @return the computed 64-bit hash value | ||
| */ | ||
| <T> long hash64(HashKey<T> hashKey, long seed); |
There was a problem hiding this comment.
The goal here is to take a single 64-bit hash result and split it into two 32-bit hashes to compute the Bloom hash locations.
-------------- 64-bit hash output --------------
| 64 bits |
------------------------------------------------
| lower 32 bits (hash1) |
| upper 32 bits (hash2) |
------------------------------------------------
Since XXH3 already performs much better than the existing hashes and we no longer need to run the hash function twice, this approach gives us an additional performance win on top of the baseline speedup.
| */ | ||
| @InterfaceAudience.Private | ||
| @InterfaceStability.Unstable | ||
| public class XXH3 extends Hash implements Hash64 { |
There was a problem hiding this comment.
I mostly followed the algorithm as described here
xxh3-algorithm-overview
Also referenced the original implementation.
https://xxhash.com/doc/v0.8.2/xxhash_8h_source.html
| } | ||
|
|
||
| private static long mul128AndFold64(long x, long y) { | ||
| // Consider switching to Math.unsignedMultiplyHigh(x, y) when we can drop Java 8. |
There was a problem hiding this comment.
https://github.com/openjdk/jdk/blob/jdk-21%2B35/src/java.base/share/classes/java/lang/Math.java#L1399-L1414
Using Math.multiplyHigh(>= jdk 9) or Math.unsignedMultiplyHigh (>= jdk 18) showed better benchmark performance because the JIT compiler replaces them with optimized CPU instructions (intrinsics).
cba1a3f to
53f0315
Compare
|
We need to implement this algorithm by ourselves? No existing libraries available? |
|
Like this one? |
|
@Apache9 Thank you for reviewing this first. 😄 I first evaluated Zero-Allocation-Hashing and hash4j. There are several reasons why I decided not to adopt these libraries. Zero-Allocation-Hashing (ZAH)HBase provides fallback logic for And HBase implementation showed slightly better performance. I believe this comes from an optimization where I pre-decode the input bytes into long values (precomputed long loads) Hash4jhash4j shows better performance starting from medium input lengths, but it only supports JDK 11 and above. The observed performance difference appears to come from intrinsic optimizations of functions that were introduced in Java 9 and later. In my personal view, hash function implementations typically do not require continuous maintenance. Therefore, maintaining our own implementation should not be considered a significant drawback. WDYT? |

Jira https://issues.apache.org/jira/browse/HBASE-29889
This PR adds XXH3 as a new Bloom filter hash type. XXH3 is designed for modern CPU architectures and shows clearly better performance than the existing Jenkins/Murmur/Murmur3 hashes used today.
Benchmark results and brief implementation notes can be found here:
Benchmark and Design Notes (Google Doc)
Benchmark test code here: jinhyukify/xxh3-benchmark