Skip to content

coco: initial integration for Confidential Containers and Trustee operators#80

Open
beraldoleal wants to merge 17 commits intovalidatedpatterns:mainfrom
beraldoleal:integration-v2
Open

coco: initial integration for Confidential Containers and Trustee operators#80
beraldoleal wants to merge 17 commits intovalidatedpatterns:mainfrom
beraldoleal:integration-v2

Conversation

@beraldoleal
Copy link

@beraldoleal beraldoleal commented Dec 8, 2025

Vide individual commits for messages.

@beraldoleal beraldoleal force-pushed the integration-v2 branch 13 times, most recently from 29c9c84 to 341c962 Compare December 10, 2025 23:06
@beraldoleal beraldoleal force-pushed the integration-v2 branch 10 times, most recently from 5074bb3 to 74e2c74 Compare December 17, 2025 01:08
@beraldoleal beraldoleal marked this pull request as ready for review December 17, 2025 01:08
Copy link
Collaborator

@sabre1041 sabre1041 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start. A few issues that have arisen during the review

  1. ZTWIM GA reconciles changes so the imperative configurations applied here are reverted immediately
  2. There is no mention about applying labels to nodes. Otherwise the sample workload fails to be scheduled
  3. There should be a callout about the instance types that may need to be configured. I tested in eastasia region and the configured instance was not available
  4. Additional comments inline

@beraldoleal
Copy link
Author

ZTWIM GA reconciles changes so the imperative configurations applied here are reverted immediately

We fixed it by adding CREATE_ONLY_MODE=true env var to the ZTWIM operator via OLM subscription config in values-coco-dev.yaml

There is no mention about applying labels to nodes. Otherwise the sample workload fails to be scheduled
We removed the nodeSelector entirely. Peer pods run as VMs, not on worker nodes directly, so the label it was unnecessary for now.

There should be a callout about the instance types that may need to be configured. I tested in eastasia region and the configured instance was not available

I will add a proper CONFIDENTIAL-CONTAINERS.md file.

@beraldoleal
Copy link
Author

Hey @sabre1041, @butler54 , @bpradipt ... let's give this a second shot! I addressed all the comments from the previous review. Feel free to reopen any or add new ones.

This was tested on Azure with AMD SEV-SNP (DCasv6 / Genoa), OCP 4.20.8, using the ZTWIM operator stable-v1 channel, sandbox operator v1.11.0, and trustee operator v1.0.0.

The chart references still point to custom branches.... waiting for @butler54 's PRs. Once those PRs merge, I will update the references. Hopefully that won't be a blocker for review.

@beraldoleal beraldoleal force-pushed the integration-v2 branch 2 times, most recently from a67a701 to c40eff1 Compare March 6, 2026 14:58
@beraldoleal
Copy link
Author

beraldoleal commented Mar 6, 2026

@sabre1041 @butler54 @bpradipt no more fork references. Its using now the official validatedpatterns/charts release versions.

Copy link
Collaborator

@butler54 butler54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few hardcoded vars that definitely need to change.
The biggest question is the use of the imperative framework to generate certificates. If this can be moved to generation in cert manager I think that would be more 'kube friendly'

The justification for this is the imperative framework is always the second last option (the last option being a work done on the developer workstation.

Comment on lines +1 to +2
---

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flag for future work - We should create an issue to add this playbook to the VP ansible collection. @mhjacks

# Generate SPIRE x509pop certificates for CoCo integration
# Creates CA certificate and agent certificates for all workloads

- name: Generate SPIRE x509pop certificates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to check here whether we should be using certificate manager or the ansible approach here.

To me this would make a lot more sense (if we can) do use cert manager then we'd have less janking around to get things done (still a non-zero amount of janking).

Copy link
Author

@beraldoleal beraldoleal Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree cert-manager would be cleaner. went the imperative route to unblock testing faster. the main friction is SPIRE expects the CA as a ConfigMap and cert-manager outputs Secrets. will explore in a follow-up PR.

echo "=== DEBUG: Testing network connectivity to KBS (cluster-internal) ==="
curl -k -I https://kbs-service.trustee-operator-system.svc.cluster.local:8080 2>&1 | head -20
echo "=== DEBUG: Testing network connectivity to KBS (public route) ==="
curl -k -I https://kbs.apps.bleal-vp.azure.sandboxedcontainers.com 2>&1 | head -20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hardcoded - you should be able to template out this using the the VP provided var for the cluster domain.

Comment on lines +55 to +59
ansible.builtin.shell: |
hash=$(sha256sum "{{ rendered_path }}" | cut -d' ' -f1)
initial_pcr=0000000000000000000000000000000000000000000000000000000000000000
echo -n "$initial_pcr$hash" | python3 -c "import sys,hashlib; print(hashlib.sha256(bytes.fromhex(sys.stdin.read())).hexdigest())"
register: pcr8_hash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you get the python script to demonstrably work? This needs to be backported into the coco-pattern to avoid a custom container.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this works. And that is the plan.

sabre1041 and others added 17 commits March 11, 2026 10:00
Signed-off-by: Andrew Block <andy.block@gmail.com>
Signed-off-by: Manuel Lorenzo <mlorenzofr@redhat.com>
…idatedpatterns#89)

* Initial architecture diagrams for the ZTVP. It shows the layered
approach for managing the development of independent use cases and
components, and the initial use case logical and schematic structures.
Includes the draw.io source and pages rendered as png images at the time
of commit.

* Updated architecture diagrams for the ZTVP. Corrections added based on
PR feedback: 1) added missing sidecar; 2) added duplicate connections;
3) moved csi driver into app space; 4) minor technical corrections for
spelling, stray drawing objects, aligning objects. Most updates to use
case 12 logical and schematic drawings. Includes the draw.io source and
pages rendered as png images at the time of commit.
Signed-off-by: Andrew Block <andy.block@gmail.com>
…ns#94)

Bumps [ansible/ansible-lint](https://github.com/ansible/ansible-lint) from 25 to 26.
- [Release notes](https://github.com/ansible/ansible-lint/releases)
- [Commits](ansible/ansible-lint@v25...v26)

---
updated-dependencies:
- dependency-name: ansible/ansible-lint
  dependency-version: '26'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/checkout](https://github.com/actions/checkout) from 5 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v5...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…tedpatterns#83)

* feat: add global certificate management with secretRef and extraValueFiles

Implements comprehensive certificate management for ZTVP:

Certificate Sources:
- Primary custom CA via secretRef (customCA.secretRef)
- Additional certificates via extraValueFiles (overrides/values-ztvp-certificates.yaml)
- Auto-detected proxy CA from trusted-ca-bundle (openshift-config-managed)
- Auto-detected ingress CA from all IngressControllers (not just default)
- Auto-detected service CA from openshift-service-ca

Features:
- Initial Job for immediate certificate extraction on install
- CronJob for periodic certificate rotation (daily at 2 AM)
- Warning and continue behavior for missing additional certificates
- Automatic rollout restart for consuming applications (labeled strategy)
- ACM Policy distribution to target namespaces

Configuration:
- Use extraValueFiles for complex nested structures (additionalCertificates, rollout)
- Simple overrides via values-hub.yaml for flat key-value pairs

Signed-off-by: Min Zhang <minzhang@redhat.com>

* feat: add secure Java truststore for qtodo with Vault integration

- Add init container to convert CA bundle to Java JKS truststore
- Add ExternalSecret for truststore password from Vault
- Configure JAVA_TOOL_OPTIONS for JVM-level SSL truststore
- Mount ztvp-trusted-ca ConfigMap for CA certificates
- Enable truststore by default when SPIRE is enabled

Signed-off-by: Min Zhang <minzhang@redhat.com>

* feat: use PKCS12 truststore with jshell bulk import for qtodo

- Use jshell with Java KeyStore API for truststore creation
- Single JVM startup instead of 149 keytool invocations
- Bulk import all CA certificates in one operation
- Significantly faster truststore creation (~2-5s vs 30-60s)

Signed-off-by: Min Zhang <minzhang@redhat.com>

* refactor: move truststore creation script to separate Java file

Move inline jshell script to a proper Java file (qtodo-truststore.java)
loaded via ConfigMap. This improves code clarity and maintainability.

Changes:
- Add charts/qtodo/files/qtodo-truststore.java with proper Java class
- Add qtodo-truststore-config.yaml to load the Java file
- Update app-deployment.yaml to use 'java /usr/local/bin/qtodo-truststore.java'
- Update superlinter.yml to exclude the Java file from linting
- Add conditionals for truststore.enabled to control init container and volumes
- Refactor Helm conditionals to use shorter syntax (remove 'eq ... true')

Suggested-by: Manuel Lorenzo <mlorenzofr@redhat.com>
Signed-off-by: Min Zhang <minzhang@redhat.com>

---------

Signed-off-by: Min Zhang <minzhang@redhat.com>
…patterns#96)

Reorganize Vault secrets into segmented paths for least-privilege access:

Secret Path Structure:
- apps/<app-name>/ - Application-specific secrets (e.g., apps/qtodo/)
- hub/infra/<component>/ - Infrastructure secrets (e.g., hub/infra/keycloak/)
- global/ - Shared secrets (unchanged)
- hub/ - Hub-level secrets (unchanged)

Policy Naming Convention:
- K8s auth policies: <path>-k8s-secret (for ClusterSecretStore/ExternalSecrets)
- JWT auth policies: <path>-jwt-secret (for SPIFFE workload identity)

Changes:
- Update vaultPrefixes in values-secret.yaml.template for new paths
- Update ExternalSecret references in chart values.yaml files
- Add JWT policies to values-hub.yaml for SPIFFE workload authentication
- Pass JWT policies to vault-config-jwt ansible task via vault-utils.sh

This enables application-level secret isolation where each app only has
access to its own secrets, following zero-trust principles.

Depends on: rhvp/rhvp.cluster_utils PR for auto-creating K8s auth policies

Signed-off-by: Min Zhang <minzhang@redhat.com>
…generated pull secrets (validatedpatterns#97)

Signed-off-by: Andrew Block <andy.block@gmail.com>
This adds initial integration for Confidential Containers and Trustee
Operators as a separated clustergroup.

Co-authored-by: Chris Butler <chris.butler@redhat.com>
Signed-off-by: Beraldo Leal <bleal@redhat.com>
Add automated configuration for SPIRE Server x509pop NodeAttestor plugin
required for CoCo peer-pods attestation.

CoCo peer-pods run on untrusted cloud infrastructure. Using k8s_psat
would require trusting the cloud provider's cluster. Instead, pods
perform hardware TEE attestation to KBS to obtain x509 certificates as
cryptographic proof of running in genuine confidential hardware, then
use x509pop to register with SPIRE.

The Red Hat SPIRE Operator's SpireServer CRD does not expose x509pop
configuration, requiring a ConfigMap patch via this imperative job.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
Add hello-coco Helm chart demonstrating SPIRE agent deployment in
confidential containers using x509pop node attestation. The chart
deploys a test pod in a CoCo peer-pod (confidential VM with AMD SNP or
Intel TDX) that fetches SPIRE agent certificates from KBS after TEE
attestation, establishing hardware as the root of trust instead of
Kubernetes.

The pod contains three containers: init container fetches sealed
secrets from KBS, SPIRE agent uses x509pop for node attestation, and
test workload receives SPIFFE SVIDs via unix attestation. This
validates the complete integration flow between ZTVP and CoCo
components.

Note: This could be dropped, if we stick with only the todoapp.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
Signed-off-by: Beraldo Leal <bleal@redhat.com>
Signed-off-by: Beraldo Leal <bleal@redhat.com>
Basic markdown file with deployment steps.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants