DRILL-8529: Caching QueryPlan Results by vdegans · Pull Request #3023 · apache/drill

vdegans · 2025-09-11T12:10:56Z

DRILL-8529: Caching QueryPlan Results

Description

Implements a caching mechanism for query plans and transformations, to shorten the prepare phase.

Documentation

The cache behavior can be customized via drill-override.conf

planner {
  query {
    cache {
      max_entries_amount: 100       # Maximum number of cached query plans (default: 100)
      plan_cache_ttl_minutes: 300   # Time-to-live for cached query plans in minutes (default: 300)
    }
  }
  transform {
    cache {
      max_entries_amount: 100       # Maximum number of cached transform plans (default: 100)
      plan_cache_ttl_minutes: 300   # Time-to-live for cached transform plans in minutes (default: 300)
    }
  }
}

max_entries_amount: limits the number of cached plans. Older entries are evicted when the limit is reached.
plan_cache_ttl_minutes: sets the lifetime of cached plans. Expired entries are recomputed on next use.

At runtime, caching can also be toggled with:
planner.cache.enable (true = enabled, false = disabled)

Testing

Manual testing shows reduced query planning time for repeated large query plans.
Automated tests are being added to verify correctness and cache eviction behavior.

cgivre · 2025-09-11T13:02:47Z

@vdegans Wow! This is an impressive first contribution to Drill! Before we start review, would you please do a clean rebase on master? I'm sure you didn't mean to pull in all those old versions.

vdegans · 2025-09-11T13:39:31Z

@cgivre Thanks! I fixed the rebase.

cgivre · 2025-09-11T14:08:53Z

@cgivre Thanks! I fixed the rebase.

Thanks. Before I start review I had a few questions:

Is there ever a case where someone might want different cache settings for different storage plugins?
Or.. is there a situation where a user might want to disable caching entirely for certain plugins but not others?

What I'm getting at here is would it make sense to have global settings which you already have, but then also give the user the ability to set custom settings for specific plugins if they wanted to do so. I genuinely don't know if that is worth the effort or not. I could imagine this being more of an issue with data where the schema could change--MongoDB or JSON files for instance--and queries like SELECT * might bring back different data every time you run them.

vdegans · 2025-09-11T14:16:53Z

Thanks, that’s a great point. I agree that giving users control over caching at the per-plugin level makes sense. Some plugins, especially ones where the underlying data might change frequently, like MongoDB or JSON files, could benefit from having caching disabled or customized independently from the global settings. I think supporting per-plugin cache configuration would give users the flexibility to optimize caching behavior for their specific use cases and improve the overall user experience.

cgivre · 2025-09-18T15:29:02Z

@vdegans Hi Vincent, Any update?

vdegans · 2025-09-19T13:44:27Z

Hi @cgivre, I was thinking about the suggestion for a setting to enable/disable caching per plugin and I got stuck on the idea of when to cache and when not to.
I think the best solution right now is to disable caching all together when one of the used plugins is set to disabled for caching, since I am not sure how a partially cached query plan would work (if it could even work).

I didn't get much time to look at the code yet, but I would like to hear your thoughts about the settings per plugin.

cgivre · 2025-09-22T13:19:02Z

Hi @cgivre, I was thinking about the suggestion for a setting to enable/disable caching per plugin and I got stuck on the idea of when to cache and when not to. I think the best solution right now is to disable caching all together when one of the used plugins is set to disabled for caching, since I am not sure how a partially cached query plan would work (if it could even work).

I didn't get much time to look at the code yet, but I would like to hear your thoughts about the settings per plugin.

The whole idea of partially cached query plans is extremely tricky. I think but could be wrong but there may have been some work on that from the Calcite team at one point.

In any event, my suggestion would be to start simple. Let's get all the unit tests to pass and just start with simple caching. IE: exact query match. Once that's done and merged, we can iterate and find improvements.

cgivre · 2025-10-09T13:33:38Z

@vdegans You should rebase on current master. I think that will solve the size limit issue you're running into.

vdegans · 2025-10-09T14:03:32Z

I think caffeine might cause this, should I add caffeine to the exclude list?

cgivre · 2025-10-09T14:05:56Z

I think caffeine might cause this, should I add caffeine to the exclude list?

You can either exclude caffeine or bump up the max size. Either is fine.

vdegans · 2025-10-09T14:24:39Z

Locally this seems to have fixed the issue

cgivre · 2026-02-04T15:24:32Z

@vdegans I have some time now and can assist with this. Could you please rebase on current master and once you've done that, I'll see how I can help.

vdegans · 2026-02-04T15:39:41Z

Thank you for your time, I have just rebased the branch.

cgivre · 2026-02-04T15:46:08Z

@vdegans Can you take a look at my fork (https://github.com/cgivre/drill/tree/rebased-cache)? I think these changes will help get the CI/CD to pass.

…ogic - Fix CustomCacheManager static initializer that crashed tests during class load by using lazy initialization with double-checked locking - Consolidate caching logic: remove duplicate ConcurrentHashMap cache from DrillSqlWorker and redundant caching in Foreman, use CustomCacheManager only - Clean up verbose debug logging (remove logger.info calls, printStackTrace) - Add clearCaches() and reset() methods for testing support - Only cache SELECT queries (not DDL/DML statements)

cgivre

Thank you very much for this contribution. I have some minor nits, and some questions:

Is there ever a situation where the underlying data would change and it would affect the query plan such that Drill isn't returning the correct data?
If possible, I'd really like to see a flag added to the metadata that is returned which would indicate that the query plan was from cache.
Do you anticipate any security issues from using cached query plans? For instance, let's say that we have user translation enabled and user 1 executes a query against a MySQL database. User 2 then tries the same query, but does not have the same access. Would user 2 be able to access the data?

cgivre · 2026-02-04T18:06:37Z

exec/java-exec/src/main/java/org/apache/drill/exec/cache/CustomCacheManager.java

+  }
+
+  public static void logCacheStats() {
+    ensureInitialized();


These stats would be really useful to expose in the UI. However, I'd suggest adding that in another PR. Let's get this merged first!

cgivre · 2026-02-04T18:11:39Z

exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java

+    if (planCacheEnabled) {
+      CustomCacheManager.putTransformedPlan(key, output);
+      logger.debug("Cached transform result for phase: {}", phase);
+    }


Do you think it would be possible to include a flag in the results that would indicate that the query plan was retrieved from cache?

cgivre · 2026-02-04T18:13:30Z

exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java

    @SuppressWarnings("deprecation")
    final OptionDefinition[] definitions = new OptionDefinition[]{
+      new OptionDefinition(PlannerSettings.PLAN_CACHE),
+      // new OptionDefinition(PlannerSettings.PLAN_CACHE_TTL),


Please remove commented out lines.

cgivre · 2026-02-04T18:14:55Z

exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java

   * Moves query to RUNNING state.
   */
  private void startQueryProcessing() {
+    logger.info("Starting query processing");


Please make debug. Drill already emits a lot of logs.

cgivre · 2026-02-04T18:15:08Z

exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java

        logger.info("Query text for query with id {} issued by {}: {}", queryIdString,
            queryContext.getQueryUserName(), sql);
        runSQL(sql);
+        logger.info("RunSQL is executed within {}", new Date().getTime() - start);


Please change to debug.

vdegans force-pushed the rebased-cache branch from 85a5b9b to 819c3f1 Compare September 11, 2025 12:19

cgivre assigned vdegans Sep 11, 2025

cgivre added doc-impacting PRs that affect the documentation performance PRs that Improve Performance labels Sep 11, 2025

vdegans force-pushed the rebased-cache branch 2 times, most recently from 5ac8803 to c49c581 Compare September 11, 2025 13:18

vdegans marked this pull request as ready for review October 9, 2025 11:46

vdegans force-pushed the rebased-cache branch from 75dc04b to 006b08a Compare October 9, 2025 13:44

vdegans force-pushed the rebased-cache branch from 006b08a to 1e15b2a Compare October 9, 2025 14:07

vdegans force-pushed the rebased-cache branch from 992e6f3 to 35ac0a1 Compare November 18, 2025 13:58

alisihab and others added 8 commits February 4, 2026 16:37

introduce caching plain sql and plan

2347c16

remove logs

01a9072

fix imports

dcbab61

fix styling

8236f03

Add cache settings

35d6d45

fix property name

30c6624

add apache licence

11c0aa1

exclude caffeine to reduce jar size

08d621a

disable plan cache by default

59223a5

vdegans force-pushed the rebased-cache branch from 35ac0a1 to 59223a5 Compare February 4, 2026 15:38

cgivre requested changes Feb 4, 2026

View reviewed changes

cgivre changed the title ~~Caching QueryPlan Results~~ DRILL-8529: Caching QueryPlan Results Feb 5, 2026

Conversation

vdegans commented Sep 11, 2025

DRILL-8529: Caching QueryPlan Results

Description

Documentation

Testing

Uh oh!

cgivre commented Sep 11, 2025

Uh oh!

vdegans commented Sep 11, 2025

Uh oh!

cgivre commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vdegans commented Sep 11, 2025

Uh oh!

cgivre commented Sep 18, 2025

Uh oh!

vdegans commented Sep 19, 2025

Uh oh!

cgivre commented Sep 22, 2025

Uh oh!

cgivre commented Oct 9, 2025

Uh oh!

vdegans commented Oct 9, 2025

Uh oh!

cgivre commented Oct 9, 2025

Uh oh!

vdegans commented Oct 9, 2025

Uh oh!

cgivre commented Feb 4, 2026

Uh oh!

vdegans commented Feb 4, 2026

Uh oh!

cgivre commented Feb 4, 2026

Uh oh!

cgivre left a comment

Choose a reason for hiding this comment

Uh oh!

cgivre Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

cgivre Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

cgivre Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

cgivre Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

cgivre Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cgivre commented Sep 11, 2025 •

edited

Loading