Skip to content

feat(metrics): add scheduler and git operation OTel metrics#240

Merged
worstell merged 1 commit intomainfrom
eworstell/scheduler-git-metrics
Mar 30, 2026
Merged

feat(metrics): add scheduler and git operation OTel metrics#240
worstell merged 1 commit intomainfrom
eworstell/scheduler-git-metrics

Conversation

@worstell
Copy link
Copy Markdown
Contributor

@worstell worstell commented Mar 30, 2026

Summary

Add OpenTelemetry metrics for operational visibility into the job scheduler and git strategy. These are the first custom application-level metrics in cachew — all prior OTel usage was limited to the otelhttp middleware.

Scheduler metrics

Metric Type Description
cachew.scheduler.queue_depth Gauge Pending jobs in the scheduler queue
cachew.scheduler.active_workers Gauge Workers currently executing jobs
cachew.scheduler.active_clones Gauge Clone jobs currently executing (subset of active workers)
cachew.scheduler.jobs_total Counter Completed jobs, by job.type and status
cachew.scheduler.job_duration_seconds Histogram Job duration, by job.type and status

Job types: clone, fetch, snapshot, repack, other.

Git operation metrics

Metric Type Description
cachew.git.operations_total Counter Git operations by operation, upstream, status
cachew.git.operation_duration_seconds Histogram Operation duration by operation, upstream, status
cachew.git.requests_total Counter HTTP requests by type (upload-pack, snapshot, bundle, receive-pack)

Context

Part of the operational hardening effort. The immediate goal is visibility into worker utilization and queue depth so we can detect saturation (e.g. clone storms from service accounts) before it impacts users.

All metrics flow through the existing OTel/Prometheus pipeline and are scraped on :9102/metrics automatically.

@worstell worstell requested a review from a team as a code owner March 30, 2026 20:09
@worstell worstell requested review from alecthomas and removed request for a team March 30, 2026 20:09
Add OpenTelemetry metrics for operational visibility into the job
scheduler and git strategy:

Scheduler metrics:
- cachew.scheduler.queue_depth: pending jobs gauge
- cachew.scheduler.active_workers: running workers gauge
- cachew.scheduler.active_clones: clone-specific concurrency gauge
- cachew.scheduler.jobs_total: counter by job type and status
- cachew.scheduler.job_duration_seconds: histogram by job type and status

Git operation metrics:
- cachew.git.operations_total: counter by operation/upstream/status
- cachew.git.operation_duration_seconds: histogram for clone/fetch/snapshot
- cachew.git.requests_total: HTTP request counter by type

All metrics flow through the existing OTel/Prometheus pipeline and are
scraped on :9102/metrics automatically.

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019d404e-21ec-723a-b211-c619925dd12e
@worstell worstell force-pushed the eworstell/scheduler-git-metrics branch from ef643e8 to eefe9f2 Compare March 30, 2026 20:17
@worstell worstell merged commit 4b29ada into main Mar 30, 2026
7 checks passed
@worstell worstell deleted the eworstell/scheduler-git-metrics branch March 30, 2026 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants