Releases: datum-cloud/activity
@datum-cloud/activity-ui v0.6.0
What's Changed
- refactor(ui): rebuild activity UI on @datum-cloud/datum-ui by @mattdjenkinson in #194
- chore: bump version to minor 0.5.0 by @mattdjenkinson in #196
New Contributors
- @mattdjenkinson made their first contribution in #194
Full Changelog: v0.4.1...v0.6.0
@datum-cloud/activity-ui v0.4.1
What's Changed
- chore: bump ui package version to 0.4.0 by @kevwilliams in #188
Full Changelog: v0.3.4...v0.4.1
v0.3.5
What's Changed
- feat: switch to label-driven releases with workflow_dispatch escape hatch by @kevwilliams in #160
- fix: bump datum-cloud/actions refs to v1.13.1 by @kevwilliams in #180
- chore: sync ui package.json version to latest published tag by @kevwilliams in #181
Full Changelog: v0.3.3...v0.3.5
@datum-cloud/activity-ui v0.3.4
What's Changed
- feat: switch to label-driven releases with workflow_dispatch escape hatch by @kevwilliams in #160
- fix: bump datum-cloud/actions refs to v1.13.1 by @kevwilliams in #180
- chore: sync ui package.json version to latest published tag by @kevwilliams in #181
Full Changelog: v0.3.3...v0.3.4
v0.3.3
This patch release improves alerting reliability. The v0.3.2 SLO alerting introduced a few edge cases — burn rate dashboards could show false 100% spikes on low-traffic endpoints, an alert configuration accidentally silenced unrelated warnings cluster-wide, and a leftover blackhole route was interfering with alert delivery. All three are now fixed.
What's Fixed
- False burn rate spikes are gone. Low-traffic SLO endpoints (metadata, audit queries, activity queries) no longer show phantom 100% error ratios during brief traffic pulses.
- Warning alerts are no longer silenced cluster-wide. A severity-based inhibit rule was scoped too broadly — it now only applies to activity system alerts as originally intended.
- Alert delivery is no longer blocked by a dummy route. A leftover blackhole-style receiver in the AlertmanagerConfig was interfering with alert routing and has been removed.
PRs Included
Alerting Fixes
- #167 — Fix false 100% burn rates for low-traffic SLO endpoints
- #174 — Remove blackhole route from alert inhibitions config
- #178 — Fix inhibit rule silencing all warning alerts
Dependencies
- #161 — Update monaco-editor to ^0.52.0 || ^0.55.0
- #163 — Update golang.org/x/term to v0.41.0
- #164 — Update golang.org/x/time to v0.15.0
- #168 — Update actions/checkout to v6
- #169 — Update @rollup/plugin-commonjs to v29
- #172 — Update k8s.io/kube-openapi digest
Full Changelog: v0.3.2...v0.3.3
v0.3.2
Activity v0.3.2 is all about reliability. After running in production for two weeks, we did a deep analysis of the alerting and found that most alerts were false positives while real issues went undetected. This release fixes that.
Smarter Alerting
The old alerts fired constantly on low-traffic streams because they couldn't tell the difference between "nothing happening" and "something broken." The new alerts check for actual backlogs before raising alarms, and noisy alerts are now suppressed when a root-cause alert already explains the problem.
- Replaced 2 constantly-firing NATS alerts with backlog-aware versions
- Fixed an alert that could never fire due to a wrong label selector
- Added inhibition rules so one outage doesn't trigger a cascade of 10 alerts
SLO-Based Monitoring
Instead of arbitrary thresholds, the service now tracks five SLOs with burn-rate alerting that catches problems early without crying wolf:
- Metadata operations, audit queries, activity queries, event queries, and overall availability
- A new Grafana dashboard shows error budget remaining and burn rate trends
- Validated against a real production incident — would have detected it 2 hours earlier
Better Coverage for Real Failures
Added alerts for failure modes that went undetected in production:
- NATS consumer backlog growth
- Vector receiving events but failing to write them to storage
- Slow-leak DLQ accumulation from broken policies
- ClickHouse concurrency saturation (a 30-minute early warning before latency degrades)
- Audit query timeout errors
UI Improvements
- New timeline view variant for the activity feed
- Debounced search for better typing experience
- Fixed UTC timestamp display
Under the Hood
- Increased apiserver CPU limits to prevent throttling under load
- Excluded long-lived WATCH connections from latency metrics (they were pinning dashboards to 60s)
- Security fix for the MCP SDK dependency
- Updated Go to 1.26, Kubernetes libraries to v0.35.3, and ClickHouse client to v2.43.0
PRs Included
Observability
- #141 — Reduce alert noise and increase apiserver CPU headroom
- #147 — Add SLO burn-rate alerting and error budget dashboard
- #152 — Reduce alert noise during cascading failures
- #153 — Catch silent failures that slipped past existing alerts
- #154 — Add early warning alert for ClickHouse query queueing
UI
- #142 — Timeline variant, debounced search, and UTC timestamp fix
Security
- #143 — Update MCP SDK to v1.4.1 (security fix)
Dependencies
- #150, #134, #127, #128, #135, #121, #118, #117, #159, #158, #157, #156, #151, #149, #148, #131, #130, #123, #122
Full Changelog: v0.3.1...v0.3.2
v0.3.1
What's Changed
Fixed an issue where staff users couldn't manage activity policies in production. The ActivityPolicy resource was accidentally left out of the deployment configuration, so the IAM system didn't know it existed. This release ensures it gets deployed alongside all other activity resources.
v0.3.0 — The "What Just Happened?" Release
Activity v0.3.0 goes from "search your audit logs" to "understand everything happening in your control plane." This is the biggest release yet — here's what's new.
Human-Readable Activity Feed
Nobody wants to read raw audit logs. Activity now translates them into plain-language summaries like "Alice created HTTPProxy for myservice.com." You write the rules, Activity does the translating. Check out the activity policies guide to get started.
Control Plane Events — Now Searchable
Those fleeting status messages that vanish after an hour? They're now stored and searchable right alongside your audit logs. No more "I swear I saw an error earlier but now it's gone." The control plane events guide walks through how to query them.
Reprocess History
Added a new translation rule but want it to cover last week too? ReindexJob backfills your activity feed from stored history so you're not missing the past. See the ReindexJob guide for details.
A UI for Exploring Activity
A new React component library (@datum-cloud/activity-ui) gives you an embeddable activity feed, policy editor, and event explorer — ready to drop into your platform.
Ask Your AI Assistant What Happened
Activity now works with AI tools like Claude through MCP. Ask "what changed in networking last week?" and get real answers pulled from your activity data. The new Claude Code plugin adds guided workflows for incident investigation and policy authoring. The MCP server guide has setup instructions.
Real-Time Streaming
Dashboards can now receive instant updates as things happen — no more polling and refreshing.
Everything Else
- Structured logging across all components
- Grafana dashboards ship out of the box
- Failed message auto-retry with monitoring
- TLS support for internal connections
- Access control roles for every resource
- Redesigned CLI with clearer subcommands
- Faster event queries with optimized storage layout
Full release notes | Full Changelog: v0.2.0...v0.3.0
v0.2.0
What's Changed
- feat: view resource history over time by @scotwells in #29
Full Changelog: v0.1.0...v0.2.0
v0.1.0
What's Changed
- Add ClickHouse database for audit log storage by @scotwells in #1
- feat: add aggregated API server implementation by @scotwells in #2
- feat: functional end-to-end testing environment by @scotwells in #4
- feat: introduce the activity CLI by @scotwells in #5
- chore(deps): bump the go_modules group across 1 directory with 2 updates by @dependabot[bot] in #3
- feat: publish docker image and kustomize bundle by @scotwells in #6
- fix: use tagged test-infra repo for taskfile by @scotwells in #7
- feat: support secure clickhouse connections by @scotwells in #8
- feat: securely connect apiserver to clickhouse by @scotwells in #10
- feat: use clickhouse client config file for SSL configuration in clickhouse migrations by @scotwells in #9
- feat: collect metrics from clickhouse databases by @scotwells in #11
- feat: add milo IAM configurations by @scotwells in #12
- feat: support custom ca request header authentication by @scotwells in #13
- feat: support external cluster api aggregation by @scotwells in #14
- fix: adjust sidecar webhook processing by @scotwells in #15
- feat: support passing UID in aggregated apiserver requests by @scotwells in #16
- feat: calculate pipeline delay by @scotwells in #18
- feat: use factory based configurations by @scotwells in #17
- feat: performance testing foundation by @scotwells in #20
- fix: optimize performance for platform-wide querying by @scotwells in #23
- feat: setup observability tooling by @scotwells in #24
- feat: remove private IPs from audit logs by @scotwells in #25
- feat: filter by user uid by @scotwells in #26
- feat: highly available clickhouse deployment by @scotwells in #27
- feat: add more validation for filtering by @scotwells in #28
New Contributors
- @scotwells made their first contribution in #1
- @dependabot[bot] made their first contribution in #3
Full Changelog: https://github.com/datum-cloud/activity/commits/v0.1.0