Skip to content

Add Kepler power monitoring for TNF clusters#52

Open
lucaconsalvi wants to merge 2 commits intoopenshift-eng:mainfrom
lucaconsalvi:shift-week-kepler-demo
Open

Add Kepler power monitoring for TNF clusters#52
lucaconsalvi wants to merge 2 commits intoopenshift-eng:mainfrom
lucaconsalvi:shift-week-kepler-demo

Conversation

@lucaconsalvi
Copy link

Summary

  • Add Kepler v0.11.3 deployment automation (Ansible role + playbook + shell scripts) for power
    monitoring on TNF clusters
  • Include Grafana with a pre-configured TNF Power Monitoring dashboard for visualization
  • Add a Claude Code skill (/tnf-power) that queries Kepler metrics via Prometheus and generates
    a power consumption report
  • Add documentation covering Kepler architecture, usage, and a presentation guide

Details

Deployment

  • kepler.yml playbook with kepler Ansible role handling namespace, RBAC, DaemonSet,
    ServiceMonitor, and user workload monitoring setup
  • Grafana deployment with a custom dashboard (per-node power, control plane breakdown, top
    containers)
  • make deploy-kepler / make remove-kepler targets and corresponding shell scripts
  • Supports both install and removal via -e kepler_state=absent

Claude Code Skill

  • /tnf-power skill queries Prometheus for node and container CPU power metrics
  • Detects RAPL (bare metal) vs estimated (VM) measurement mode
  • Reports cluster total, per-node breakdown, control plane overhead, and top containers

Documentation

  • docs/kepler/README.md — Setup and usage guide
  • docs/kepler/KEPLER-ARCHITECTURE.md — How Kepler works on TNF clusters
  • docs/kepler/KEPLER-PRESENTATION.md — Demo/presentation walkthrough

Test plan

  • Deploy Kepler on a TNF cluster via make deploy-kepler
  • Verify Kepler exporter pods are running on both nodes
  • Confirm metrics are scraped in Prometheus (2 active targets)
  • Run /tnf-power skill and verify report output
  • Access Grafana dashboard and confirm panels render
  • Remove Kepler via make remove-kepler and verify cleanup

@openshift-ci openshift-ci bot requested review from jaypoulz and jeff-roche March 5, 2026 14:23
@openshift-ci
Copy link

openshift-ci bot commented Mar 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lucaconsalvi
Once this PR has been reviewed and has the lgtm label, please assign jeff-roche for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant