Skip to content

feat(scheduler): add priority queue support#241

Open
worstell wants to merge 1 commit intomainfrom
eworstell/scheduler-priority-queue
Open

feat(scheduler): add priority queue support#241
worstell wants to merge 1 commit intomainfrom
eworstell/scheduler-priority-queue

Conversation

@worstell
Copy link
Copy Markdown
Contributor

Summary

Add a priority-queues config option to the scheduler. Jobs whose queue name matches a priority prefix are dequeued before non-priority jobs, while maintaining FIFO order within each tier.

Motivation

When cachew is under load (e.g., a burst of cold clone requests), important repositories like monorepos can get stuck behind a queue of less critical work. This change lets operators ensure high-value repos are always serviced first.

Configuration

scheduler {
  priority-queues = ["https://github.com/org/monorepo"]
}

Queue names are upstream URLs, so the values are intuitive. Multiple prefixes can be specified.

Design

The takeNextJob scan does a single pass: it tracks the first eligible non-priority job as a fallback, but immediately selects the first eligible priority job if found. This preserves the existing O(n) scan cost with no additional data structures.

Existing behaviour is unchanged when priority-queues is empty (the default).

@worstell worstell requested a review from a team as a code owner March 30, 2026 20:25
@worstell worstell requested review from jrobotham-square and removed request for a team March 30, 2026 20:25
@worstell worstell force-pushed the eworstell/scheduler-priority-queue branch from 919d6cc to a7037a5 Compare March 30, 2026 20:32
}
if q.maxCloneConcurrency > 0 && isCloneJob(job.id) && q.activeClones >= q.maxCloneConcurrency {
continue
if q.isPriority(job.queue) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean it will never take a non-priority job?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, they get added when there are no more priority jobs left - which is just the old behavior (FIFO).

basically the workers wake up, look to take their next job, if they come across a priority job they take it and break. otherwise the queue gets looped through in its entirety and then the first job taken (see lines below where index gets set if non-priority)

Add a priority-queues config option to the scheduler that accepts a list
of queue name prefixes. Jobs whose queue matches a priority prefix are
dequeued before non-priority jobs, while maintaining FIFO order within
each tier.

This allows operators to ensure that known important repositories (e.g.,
monorepos) are never starved by a flood of cold clone jobs for less
critical repos. The queue name is the upstream URL, so configuration is
straightforward:

  scheduler {
    priority-queues = ["https://github.com/org/monorepo"]
  }

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d404e-21ec-723a-b211-c619925dd12e
@worstell worstell force-pushed the eworstell/scheduler-priority-queue branch from a7037a5 to 6f6b9e0 Compare March 30, 2026 21:40
@worstell worstell requested a review from stuartwdouglas March 30, 2026 22:30
MaxCloneConcurrency int `hcl:"max-clone-concurrency" help:"Maximum number of concurrent clone jobs. Remaining worker slots are reserved for fetch/repack/snapshot jobs. 0 means no limit." default:"0"`
SchedulerDB string `hcl:"scheduler-db" help:"Path to the scheduler state database." default:"${CACHEW_STATE}/scheduler.db"`
Concurrency int `hcl:"concurrency" help:"The maximum number of concurrent jobs to run (0 means number of cores)." default:"4"`
MaxCloneConcurrency int `hcl:"max-clone-concurrency" help:"Maximum number of concurrent clone jobs. Remaining worker slots are reserved for fetch/repack/snapshot jobs. 0 means no limit." default:"0"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW unrelated, but can we get rid of the tight coupling between cloning and the scheduler? We should be able to build an abstraction here, similar to what you've done with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants