Implement per column compression by rahil-c · Pull Request #3396 · apache/parquet-java

rahil-c · 2026-02-16T23:42:48Z

Rationale for this change

Issue Raised here: apache/parquet-format#553

The Parquet spec already supports per-column compression, each column chunk stores its own CompressionCodecName in the footer metadata. However, the parquet-java writer API currently forces a single compression codec for all columns in a file. This PR address that gap by exposing per-column compression configuration through the existing ColumnProperty infrastructure.

What changes are included in this PR?

ParquetProperties: Added ColumnProperty following the same pattern used for dictionary encoding, bloom filters.
ColumnChunkPageWriteStore: Added a new constructor that accepts CompressionCodecFactory + ParquetProperties
InternalParquetRecordWriter: Added a new constructor accepting CompressionCodecFactory instead of a single BytesInputCompressor.
ParquetWriter: Added withCompressionCodec(String,CompressionCodecName) builder method. Updated the core constructor to pass the CompressionCodecFactory through to the writer stack.
ParquetOutputFormat: Added ColumnConfigParser entry so per-column compression can be configured via Hadoop config keys (parquet.compression#=CODEC).
ParquetRecordWriter: Updated to pass CompressionCodecFactory to InternalParquetRecordWriter.

Are these changes tested?

Added test within this pr

Are there any user-facing changes?

Two new public APIs are introduced:

  ParquetWriter.builder(path)
      .withCompressionCodec(CompressionCodecName.SNAPPY)
         // default for all columns
      .withCompressionCodec("embeddings",
  CompressionCodecName.UNCOMPRESSED)  // per-column override
      .build();

  Hadoop configuration (new key pattern):
  parquet.compression#<column_path>=<CODEC_NAME>

cc @julienledem @emkornfield

[draft] Implement per column compression

a1c1d45

rahil-c marked this pull request as ready for review February 17, 2026 02:44

rahil-c changed the title ~~[draft] Implement per column compression~~ Implement per column compression Feb 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement per column compression#3396

Implement per column compression#3396
rahil-c wants to merge 1 commit intoapache:masterfrom
rahil-c:rahil/per-column-compression

rahil-c commented Feb 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rahil-c commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rahil-c commented Feb 16, 2026 •

edited

Loading