It always felt a bit funny to me that concurrency was declared in cog.yaml, when it felt like a thing that was associated with the predictor. It's the async on predict() that marks whether a model can be concurrent, then max concurrency is sort like a default concurrency for that function.
This becomes more clear when we consider train, or other arbitrary functions if that's ever a thing. It's each function that has a concurrency, not the model as a whole.
Anyway, here's what Claude thinks it could look be if it was a decorator...
Proposed API
import cog
from cog import BasePredictor, Input
# Class-based predictor
class Predictor(BasePredictor):
def setup(self):
self.model = load_model()
@cog.concurrent(max=5)
async def predict(self, prompt: str = Input()) -> str:
return await self.model.generate(prompt)
# Function-based predictor
@cog.concurrent(max=5)
async def predict(prompt: str = Input()) -> str:
return await model.generate(prompt)
Design
Decorator behavior
@cog.concurrent(max=N) attaches __cog_concurrent_max__ = N metadata to the decorated function.
- Applying
@cog.concurrent(max=N) where N > 1 to a synchronous function raises TypeError at decoration time.
@cog.concurrent (bare, without arguments) is also supported and defaults to max=1.
Precedence (highest to lowest)
COG_MAX_CONCURRENCY environment variable at runtime (operator override)
cog.yaml concurrency.max → baked into Dockerfile as COG_MAX_CONCURRENCY (deprecated, see below)
@cog.concurrent(max=N) decorator on the predict function
- Default:
1
Since cog.yaml concurrency.max works by setting the env var at build time, cases 1 and 2 are effectively the same mechanism. The decorator becomes the new default path for model authors.
Deprecation of cog.yaml concurrency field
- When
concurrency is present in cog.yaml, emit a deprecation warning during validation suggesting users move to @cog.concurrent.
- The field continues to work (no breaking change) — it still generates the
COG_MAX_CONCURRENCY env var, which takes precedence over the decorator.
Implementation plan
Python SDK (python/cog/)
- New file
python/cog/concurrent.py with the decorator implementation.
- Export
concurrent from python/cog/__init__.py.
Rust coglet (crates/coglet-python/)
- In
predictor.rs: after loading the predictor, read __cog_concurrent_max__ from the predict function (via getattr).
- In
lib.rs: change read_max_concurrency() to accept the decorator value as a fallback, with env var taking priority.
- Thread the decorator value from
PythonPredictor::load() through to OrchestratorConfig.
Go CLI (pkg/config/)
- In
validate.go: add a deprecation warning when concurrency is present in cog.yaml.
Tests
- Python unit tests for the decorator (sync rejection, metadata attachment).
- Rust tests for the new resolution logic (env var > decorator > default).
- New integration test (
.txtar) verifying end-to-end: decorator sets concurrency without cog.yaml field.
Docs
- Document the
@cog.concurrent decorator.
- Update concurrency docs to recommend the decorator as the primary approach.
- Regenerate
docs/llms.txt.
What does NOT change
- The
PermitPool, slot transport, and orchestrator logic in crates/coglet/ — they just receive num_slots from a different source.
- Async/sync detection logic in
predictor.rs.
- Existing
cog.yaml configs with concurrency.max continue to work.
It always felt a bit funny to me that concurrency was declared in cog.yaml, when it felt like a thing that was associated with the predictor. It's the
asynconpredict()that marks whether a model can be concurrent, then max concurrency is sort like a default concurrency for that function.This becomes more clear when we consider
train, or other arbitrary functions if that's ever a thing. It's each function that has a concurrency, not the model as a whole.Anyway, here's what Claude thinks it could look be if it was a decorator...
Proposed API
Design
Decorator behavior
@cog.concurrent(max=N)attaches__cog_concurrent_max__ = Nmetadata to the decorated function.@cog.concurrent(max=N)whereN > 1to a synchronous function raisesTypeErrorat decoration time.@cog.concurrent(bare, without arguments) is also supported and defaults tomax=1.Precedence (highest to lowest)
COG_MAX_CONCURRENCYenvironment variable at runtime (operator override)cog.yamlconcurrency.max→ baked into Dockerfile asCOG_MAX_CONCURRENCY(deprecated, see below)@cog.concurrent(max=N)decorator on the predict function1Since
cog.yamlconcurrency.maxworks by setting the env var at build time, cases 1 and 2 are effectively the same mechanism. The decorator becomes the new default path for model authors.Deprecation of
cog.yamlconcurrencyfieldconcurrencyis present incog.yaml, emit a deprecation warning during validation suggesting users move to@cog.concurrent.COG_MAX_CONCURRENCYenv var, which takes precedence over the decorator.Implementation plan
Python SDK (
python/cog/)python/cog/concurrent.pywith the decorator implementation.concurrentfrompython/cog/__init__.py.Rust coglet (
crates/coglet-python/)predictor.rs: after loading the predictor, read__cog_concurrent_max__from the predict function (viagetattr).lib.rs: changeread_max_concurrency()to accept the decorator value as a fallback, with env var taking priority.PythonPredictor::load()through toOrchestratorConfig.Go CLI (
pkg/config/)validate.go: add a deprecation warning whenconcurrencyis present incog.yaml.Tests
.txtar) verifying end-to-end: decorator sets concurrency withoutcog.yamlfield.Docs
@cog.concurrentdecorator.docs/llms.txt.What does NOT change
PermitPool, slot transport, and orchestrator logic incrates/coglet/— they just receivenum_slotsfrom a different source.predictor.rs.cog.yamlconfigs withconcurrency.maxcontinue to work.