Skip to content

[consul] Make health check status cache size and TTL configurable#23603

Open
mwdd146980 wants to merge 5 commits intomasterfrom
mwdd146980/consul-configurable-ttl-cache
Open

[consul] Make health check status cache size and TTL configurable#23603
mwdd146980 wants to merge 5 commits intomasterfrom
mwdd146980/consul-configurable-ttl-cache

Conversation

@mwdd146980
Copy link
Copy Markdown
Contributor

@mwdd146980 mwdd146980 commented May 5, 2026

Motivation

ConsulCheck keeps an in-memory TTLCache keyed by (check_id, service_id, service_name, node_name) to detect health check status transitions and emit consul.check_failed events. Both the size (5000) and TTL (3600s) have been hardcoded since the cache was introduced.

Customers with large Consul clusters that report more than 5000 distinct health checks hit eviction. Evicted entries look new on the next run, which causes missed transitions and re-emitted failure events.

Closes AGENT-16145.

Approach

Two new instance/init_config options, falling back to the existing defaults:

  • health_checks_cache_size (default 5000)
  • health_checks_cache_ttl (default 3600)

Both are fleet_configurable and constrained to minimum: 1 in the spec. Pydantic codegen produces Field(None, ge=1), so non-positive values are rejected at first check() via check_initializations with a clear error naming the bad field.

The instance/init_config/default precedence matches the existing max_services / threads_count pattern.

Verification

  • ddev --no-interactive test consul: 28 passed, 3 skipped.
  • ddev test -fs consul: clean.
  • TDD: four new unit tests cover defaults, instance override, init_config override, and the eviction-causes-event-re-emit regression with a small cache.

🤖 Generated with Claude Code

Adds health_checks_cache_size and health_checks_cache_ttl options so users
with large Consul clusters (>5000 distinct health checks) can size the
in-memory transition-detection cache appropriately. Defaults preserve the
prior 5000/3600 behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mwdd146980 mwdd146980 self-assigned this May 5, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.04%. Comparing base (891aa96) to head (05fd936).

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@datadog-prod-us1-3
Copy link
Copy Markdown

datadog-prod-us1-3 Bot commented May 5, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 92.56% (+5.33%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 05fd936 | Docs | Datadog PR Page | Give us feedback!

…fig path

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mwdd146980
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36e7b7e10d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread consul/datadog_checks/consul/consul.py
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 5, 2026

Validation Report

All 20 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and Codecov settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

@mwdd146980 mwdd146980 marked this pull request as ready for review May 5, 2026 20:33
@mwdd146980 mwdd146980 requested review from a team as code owners May 5, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants