Skip to content

Fast lane admission control: per-project rate limiting at the webhook endpoint #4573

@stuartc

Description

@stuartc

Context

Fast lane runs currently bypass concurrency limits entirely. This is safe at current volumes — production data shows peak simultaneous in-flight of 1 across all sync webhook users — but it means there's no upper bound on how many fast lane runs a given project can have in flight.

As adoption grows, we need admission control at the point of entry (the webhook endpoint) to prevent any single project from monopolising fast lane worker capacity.

Desired behaviour

When a sync webhook request arrives and the project has exceeded its fast lane limit:

  1. Reject immediately — do not enqueue the run. The run should never enter available state.
  2. Return 429 Too Many Requests with standard rate limiting headers (Retry-After, RateLimit-Limit).
  3. The caller can then implement backoff/retry, same as any rate-limited API.

This should feel predictable to the caller — if you've got a script firing requests in a loop, you get a clear signal to back off rather than silently queueing work that may never complete within the sync timeout.

What we need

1. Per-project fast lane limit

A configurable cap on how many fast lane runs can be in-flight simultaneously for a given project. At minimum settable by super admins.

This is not the same as the existing project concurrency — it's a separate limit for the fast lane queue specifically.

2. Realtime capacity check at the webhook endpoint

Before enqueuing a sync run, the webhook handler needs to answer two questions:

  1. Has this project/workflow hit its fast lane limit? — a count of in-flight fast lane runs against the configured cap.
  2. Are there any fast lane worker slots available? — if no worker can pick this up, there's no point enqueuing it.

These are distinct checks. The first is about per-project policy, the second is about cluster-level capacity. Both need to be fast (webhook callers are waiting) and accurate enough to avoid over-admission.

Some mechanisms to consider:

  • Database counters for the per-project limit (simple, consistent, but adds a write to the hot path)
  • ETS/PubSub counters for a faster local view of in-flight counts (eventually consistent across nodes)
  • Worker capacity feedback via the claim protocol for the slot availability question (workers are the authority on their own capacity, but stale between claim cycles)

These aren't mutually exclusive — the solution will likely combine more than one approach.

Out of scope

  • Changes to the existing concurrency exemption — that stays as-is
  • Round-robin claim efficiency for dedicated queue workers (tracked separately)

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Tech Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions