Skip to content

[lake/iceberg] Add rest catalog cache#2622

Open
bakjos wants to merge 1 commit intoapache:mainfrom
bakjos:bakjos/iceberg_rest_cache
Open

[lake/iceberg] Add rest catalog cache#2622
bakjos wants to merge 1 commit intoapache:mainfrom
bakjos:bakjos/iceberg_rest_cache

Conversation

@bakjos
Copy link
Copy Markdown
Contributor

@bakjos bakjos commented Feb 9, 2026

Purpose

Query performance for the data lake table is very slow compared to querying remote storage. Add per-task lazy caching of Iceberg Catalog and Table inside IcebergLakeSource so that createRecordReader reuses one loadTable for all lake splits in a Flink source task, eliminating O(splits) REST round-trips when using a REST catalog.

Before: N splits → N × (createCatalog + loadTable) → N REST calls per task.
After: N splits → 1 × (createCatalog + loadTable) on first split, then N-1 reuses → 1 REST loadTable per task. With TTL enabled, the cache is refreshed after the TTL period so externally changed table metadata is picked up.

Linked issue: close #2619

Brief change log

Tests

API and Format

Documentation

* Config key for table cache TTL in milliseconds. After this duration, the cached table is
* reloaded on next use. Set to 0 to disable TTL (cache never expires). Default: 5 minutes.
*/
public static final String TABLE_CACHE_TTL_MS_KEY = "iceberg.catalog.table-cache-ttl-ms";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is actually needed anymore. In Iceberg 1.11.0 this is already solved natively in the REST catalog by having a freshness-aware loading of tables using etags (apache/iceberg#14398). The Iceberg community is planning to release 1.11.0 with that feature within the next few weeks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this issue will be resolved once Apache Iceberg 1.11.0 is released, though we are unsure of the exact timeline. Additionally, are there any plans to update Fluss to use Iceberg 1.11.0?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once 1.11.0 is released I'll update it in Fluss right away. I'm working in the Iceberg community and we're planning to release 1.11.0 within the next few weeks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Iceberg s3 table tearing bugs

2 participants