OCPNODE-4106: Propagate KubeletConfig TLS min version to CRI-O via ContainerRuntime #5672
OCPNODE-4106: Propagate KubeletConfig TLS min version to CRI-O via ContainerRuntime #5672asahay19 wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
/hold |
8db40b4 to
8c13d8f
Compare
We can unhold it, latest 4.22 build now contains crio-1.35 . |
|
/retest |
2b984f7 to
ce2c9c0
Compare
|
/test unit |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
@asahay19: This pull request references OCPNODE-4106 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds TLS minimum-version propagation from KubeletConfig (with APIServer fallback) into CRI-O via generated drop-ins, wires KubeletConfig and APIServer informers into the container-runtime-config controller and bootstrap paths, and adds helpers and tests to compute, inject, and validate the TLS setting. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
pkg/controller/bootstrap/bootstrap.go (1)
256-262:⚠️ Potential issue | 🟠 MajorBootstrap still skips CRI-O TLS when no
ContainerRuntimeConfigexists.This new path only runs behind
len(crconfigs) > 0, so a fresh cluster that sets TLS throughKubeletConfigor APIServer but never defines aContainerRuntimeConfiggets no CRI-O TLS drop-in at install time. That leaves bootstrap behavior dependent on an unrelated user CR.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/bootstrap/bootstrap.go` around lines 256 - 262, The current conditional skips CRI-O TLS bootstrap when crconfigs is empty; ensure CRI-O TLS drop-ins are always generated by invoking the container runtime bootstrap logic unconditionally (or extracting the TLS-specific logic out of containerruntimeconfig.RunContainerRuntimeBootstrap into a function that runs regardless of crconfigs). Specifically, call or refactor containerruntimeconfig.RunContainerRuntimeBootstrap (or a new RunContainerRuntimeTLSBootstrap) using b.templatesDir, crconfigs (possibly empty), cconfig, pools, kconfigs, apiServer so TLS artifacts are produced even when len(crconfigs)==0, handle the returned configs and error the same way, and append the resulting containerRuntimeConfigs to configs.
🧹 Nitpick comments (1)
pkg/controller/container-runtime-config/helpers_test.go (1)
1401-1512: Cover the actual bootstrap selection path here.This test only exercises
getTLSMinVersionFromKubeletConfigsBootstrap, so it never hits the APIServer fallback ingetTLSMinVersionForPoolBootstrapor the multi-match precedence case. Adding those cases would lock down the new bootstrap behavior much better.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/container-runtime-config/helpers_test.go` around lines 1401 - 1512, The test currently only calls getTLSMinVersionFromKubeletConfigsBootstrap; extend TestGetTLSMinVersionForPool to call the higher-level getTLSMinVersionForPoolBootstrap so you exercise the APIServer fallback and multi-match precedence logic: add a case where kubeletConfigs is nil but an apiserver TLSProfile (use the APIServer object with TLSSecurityProfile set) yields the expected min version, and add a case with multiple KubeletConfigs matching the same pool with conflicting MinTLSVersion values to assert which one wins (matching the precedence implemented in getTLSMinVersionForPoolBootstrap); reference getTLSMinVersionForPoolBootstrap, getTLSMinVersionFromKubeletConfigsBootstrap (and getTLSMinVersionFromAPIServerBootstrap if present) when locating code to update.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@pkg/controller/container-runtime-config/container_runtime_config_bootstrap.go`:
- Around line 106-125: The function getTLSMinVersionFromKubeletConfigsBootstrap
is relying on slice iteration order to pick the TLS minimum version, which makes
selection order-dependent; instead, first determine the single effective
KubeletConfig for the pool (by filtering using
metav1.LabelSelectorAsSelector/selector.Matches on
kc.Spec.MachineConfigPoolSelector and kc.Name) and then derive the TLS min
version from that resolved config using ctrlcommon.GetSecurityProfileCiphers.
Implement this by collecting matching kc objects into a slice, pick the
effective one (prefer the explicit resolver if available, e.g. a helper like
ResolveKubeletConfig, or if none exists, choose the newest by
kc.CreationTimestamp), then call GetSecurityProfileCiphers on that single config
and return its minVersion.
In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`:
- Around line 875-880: The log claims the TLS version came "from KubeletConfig"
even when getTLSMinVersionForPool falls back to the API server; change the code
to record the actual source and log it. Modify getTLSMinVersionForPool to return
both the tlsMinVersion and a source identifier (e.g., "KubeletConfig" or
"APIServer"), update the call site here to unpack (tlsMinVersion, tlsSource),
and replace the klog.Infof message to include tlsSource (e.g., "Propagating TLS
min version %q from %s to CRI-O drop-in for pool %s"); keep usage of
createCRIOTLSDropinFile and appending to configFileList the same.
- Around line 1458-1489: The function getTLSMinVersionFromKubeletConfigs is
non-deterministic because kubeletConfigs from ctrl.kcLister.List are iterated in
arbitrary order; sort the slice by KubeletConfig.Name (add "sort" to imports)
before the for-loop so the selection of tlsMinVersion is deterministic across
runs—i.e., perform sort.Slice(kubeletConfigs, func(i,j int) bool { return
kubeletConfigs[i].Name < kubeletConfigs[j].Name }) then keep the existing logic
(which uses ctrlcommon.GetSecurityProfileCiphers on each kc) to set
tlsMinVersion.
In `@test/e2e-2of2/ctrcfg_test.go`:
- Around line 417-435: Capture the initial TLS drop-in presence in
tlsDropinFileName (as done with initialTLSExists) and after deleting the
KubeletConfig assert that the node returns to that captured state instead of
always expecting the file to be removed; update the post-delete checks (the
block around the delete and the similar block at 510-524) to call fileExists
and, if initialTLSExists was true, verify the drop-in is present and its
tls_min_version value (via getValueFromCrioConfig) has returned to the expected
pre-test value, otherwise if initialTLSExists was false verify the file was
removed.
- Around line 378-407: The test currently ignores AlreadyExists when creating
fixed-name fixtures (defaultMCConfig via helpers.CreateMC and ctrcfgObj created
with ctrcfgName) which makes runs on reused clusters non-deterministic; change
the creation logic to either generate unique names (e.g., append a
UUID/timestamp to mcName and ctrcfgName before calling helpers.CreateMC and
building ctrcfgObj) or, if you must keep fixed names, call
cs.MachineConfigs().Get and cs.ContainerRuntimeConfigs().Get first and validate
the existing object's spec matches the expected spec (and fail the test or
update the object if it differs) instead of silently continuing when
IsAlreadyExists is returned. Ensure you update cleanupFuncs to delete the actual
resource names you create or validate.
---
Duplicate comments:
In `@pkg/controller/bootstrap/bootstrap.go`:
- Around line 256-262: The current conditional skips CRI-O TLS bootstrap when
crconfigs is empty; ensure CRI-O TLS drop-ins are always generated by invoking
the container runtime bootstrap logic unconditionally (or extracting the
TLS-specific logic out of containerruntimeconfig.RunContainerRuntimeBootstrap
into a function that runs regardless of crconfigs). Specifically, call or
refactor containerruntimeconfig.RunContainerRuntimeBootstrap (or a new
RunContainerRuntimeTLSBootstrap) using b.templatesDir, crconfigs (possibly
empty), cconfig, pools, kconfigs, apiServer so TLS artifacts are produced even
when len(crconfigs)==0, handle the returned configs and error the same way, and
append the resulting containerRuntimeConfigs to configs.
---
Nitpick comments:
In `@pkg/controller/container-runtime-config/helpers_test.go`:
- Around line 1401-1512: The test currently only calls
getTLSMinVersionFromKubeletConfigsBootstrap; extend TestGetTLSMinVersionForPool
to call the higher-level getTLSMinVersionForPoolBootstrap so you exercise the
APIServer fallback and multi-match precedence logic: add a case where
kubeletConfigs is nil but an apiserver TLSProfile (use the APIServer object with
TLSSecurityProfile set) yields the expected min version, and add a case with
multiple KubeletConfigs matching the same pool with conflicting MinTLSVersion
values to assert which one wins (matching the precedence implemented in
getTLSMinVersionForPoolBootstrap); reference getTLSMinVersionForPoolBootstrap,
getTLSMinVersionFromKubeletConfigsBootstrap (and
getTLSMinVersionFromAPIServerBootstrap if present) when locating code to update.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1c1eb8aa-19fe-477d-a5b1-9f8405771346
📒 Files selected for processing (10)
cmd/machine-config-controller/start.gopkg/controller/bootstrap/bootstrap.gopkg/controller/container-runtime-config/container_runtime_config_bootstrap.gopkg/controller/container-runtime-config/container_runtime_config_bootstrap_test.gopkg/controller/container-runtime-config/container_runtime_config_controller.gopkg/controller/container-runtime-config/container_runtime_config_controller_test.gopkg/controller/container-runtime-config/helpers.gopkg/controller/container-runtime-config/helpers_test.gotest/e2e-2of2/ctrcfg_test.gotest/e2e-bootstrap/bootstrap_test.go
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`:
- Around line 350-352: The handler kubeletConfigUpdated currently enqueues all
ContainerRuntimeConfigs on any update event (including status/metadata-only
changes); change it to cast the old/new params to the KubeletConfig type, return
early if casting fails, and compare only the meaningful spec/config fields (e.g.
the Spec or ConfigData) using a deep-equality check; call
enqueueAllContainerRuntimeConfigs() only when the spec/config actually changed
to avoid full requeues on status-only updates.
In `@test/e2e-2of2/ctrcfg_test.go`:
- Around line 506-509: The poll is using wait.PollImmediate with
getValueFromCrioConfig which currently fails the test on transient node exec
errors, preventing retries; change the approach so the closure passed to
wait.PollImmediate handles transient exec errors instead of causing test fatal:
modify getValueFromCrioConfig (or add a non-fatal variant like
getValueFromCrioConfigNoFail) to return (string, error) rather than calling
t.Fatalf, and update the poll closure in ctrcfg_test.go to call that variant,
returning (false, nil) when it receives a transient exec error so PollImmediate
can retry, and only return an actual error for unrecoverable conditions; apply
the same pattern to the other occurrence around lines 534-535.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6581c020-31df-4b36-980d-d73d4685afa2
📒 Files selected for processing (10)
cmd/machine-config-controller/start.gopkg/controller/bootstrap/bootstrap.gopkg/controller/container-runtime-config/container_runtime_config_bootstrap.gopkg/controller/container-runtime-config/container_runtime_config_bootstrap_test.gopkg/controller/container-runtime-config/container_runtime_config_controller.gopkg/controller/container-runtime-config/container_runtime_config_controller_test.gopkg/controller/container-runtime-config/helpers.gopkg/controller/container-runtime-config/helpers_test.gotest/e2e-2of2/ctrcfg_test.gotest/e2e-bootstrap/bootstrap_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- cmd/machine-config-controller/start.go
- pkg/controller/container-runtime-config/helpers_test.go
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
test/e2e-2of2/ctrcfg_test.go (2)
378-381:⚠️ Potential issue | 🟠 MajorWait for leaked fixtures to disappear before recreating them.
These deletes are async, so the creates on Lines 385, 407, and 467 can still race a terminating object and fail with
AlreadyExistson reused clusters. Use unique names per run or poll each resource toNotFoundbefore recreating it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e-2of2/ctrcfg_test.go` around lines 378 - 381, The cleanup deletes for cs.MachineConfigs(), cs.ContainerRuntimeConfigs(), and cs.KubeletConfigs() are asynchronous and may still be terminating when the test later calls create, causing AlreadyExists races; update the test to either use per-run unique resource names (mcName, ctrcfgName, kcName) or, after calling Delete(...), poll the API (e.g., using a short retry loop) until Get(...) returns NotFound for each resource before proceeding to the creates on the later lines (385, 407, 467), ensuring the objects are fully removed prior to recreation.
506-509:⚠️ Potential issue | 🟠 MajorKeep these waits retryable across reboot windows.
getValueFromCrioConfigstill fails the test on transient node exec errors, so thesewait.PollImmediateloops cannot actually retry during reboot windows.Suggested change
+func getValueFromCrioConfigWithError(cs *framework.ClientSet, node corev1.Node, regexKey, confPath string) (string, error) { + out, err := helpers.ExecCmdOnNodeWithError(cs, node, "cat", filepath.Join("/rootfs", confPath)) + if err != nil { + return "", err + } + re := regexp.MustCompile(regexKey) + matches := re.FindStringSubmatch(string(out)) + if len(matches) != 2 || matches[1] == "" { + return "", fmt.Errorf("unable to parse %q from %s on node %s", regexKey, confPath, node.Name) + } + return matches[1], nil +} @@ err = wait.PollImmediate(5*time.Second, 15*time.Minute, func() (bool, error) { - val := getValueFromCrioConfig(t, cs, node, `tls_min_version = "(\S+)"`, tlsDropinPath) - return val == "VersionTLS12", nil + val, err := getValueFromCrioConfigWithError(cs, node, `tls_min_version = "(\S+)"`, tlsDropinPath) + if err != nil { + return false, nil + } + return val == "VersionTLS12", nil }) @@ err = wait.PollImmediate(15*time.Second, 25*time.Minute, func() (bool, error) { exists, err := fileExists(t, cs, node, tlsDropinFileName) if err != nil { return false, nil } if !initialTLSExists { return !exists, nil } if !exists { return false, nil } - val := getValueFromCrioConfig(t, cs, node, `tls_min_version = "(\S+)"`, tlsDropinPath) - return val == initialTLSValue, nil + val, err := getValueFromCrioConfigWithError(cs, node, `tls_min_version = "(\S+)"`, tlsDropinPath) + if err != nil { + return false, nil + } + return val == initialTLSValue, nil })Also applies to: 523-535
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e-2of2/ctrcfg_test.go` around lines 506 - 509, The PollImmediate closures fail fast because getValueFromCrioConfig can return an error on transient node exec failures; modify the anonymous functions used in wait.PollImmediate (the one calling getValueFromCrioConfig for tls_min_version and the similar block at the 523-535 range) to treat execution errors as retryable by returning (false, nil) when getValueFromCrioConfig returns an error, and only return (true, nil) when the desired value is observed (or return the error only for terminal/non-transient failures), so the wait loop can continue across reboot windows.pkg/controller/container-runtime-config/container_runtime_config_bootstrap.go (1)
108-130:⚠️ Potential issue | 🟠 MajorA newer matching
KubeletConfigstill can't clear an older TLS override.Line 111 filters out
TLSSecurityProfile == nilbefore choosingnewest. If the newest matchingKubeletConfigremoves TLS settings, this helper ignores it and keeps returning an older TLS floor instead of falling back to APIServer/defaults.Suggested change
func getTLSMinVersionFromKubeletConfigsBootstrap(kubeletConfigs []*mcfgv1.KubeletConfig, pool *mcfgv1.MachineConfigPool) string { var newest *mcfgv1.KubeletConfig for _, kc := range kubeletConfigs { - if kc.Spec.TLSSecurityProfile == nil { - continue - } selector, err := metav1.LabelSelectorAsSelector(kc.Spec.MachineConfigPoolSelector) if err != nil { klog.Warningf("invalid label selector in KubeletConfig %s: %v", kc.Name, err) continue } if selector.Empty() || !selector.Matches(labels.Set(pool.Labels)) { continue } if newest == nil || kc.CreationTimestamp.After(newest.CreationTimestamp.Time) { newest = kc } } - if newest == nil { + if newest == nil || newest.Spec.TLSSecurityProfile == nil { return "" } minVersion, _ := ctrlcommon.GetSecurityProfileCiphers(newest.Spec.TLSSecurityProfile) return minVersion }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/container-runtime-config/container_runtime_config_bootstrap.go` around lines 108 - 130, The helper getTLSMinVersionFromKubeletConfigsBootstrap currently skips KubeletConfigs with TLSSecurityProfile==nil which prevents a newer config that clears TLS settings from being chosen; change the selection logic to consider all matching kubeletConfigs (do not continue on TLSSecurityProfile==nil) so the loop can pick the newest matching kc for the pool, then after newest is chosen if newest.Spec.TLSSecurityProfile == nil return "" else call ctrlcommon.GetSecurityProfileCiphers(newest.Spec.TLSSecurityProfile) to obtain and return the minVersion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@pkg/controller/container-runtime-config/container_runtime_config_bootstrap.go`:
- Around line 57-63: The TLS drop-in is only added when rendering user
ContainerRuntimeConfig, so pools without a ContainerRuntimeConfig never get
01-ctrcfg-tlsMinVersion; move the tlsMinVersion retrieval and append logic (use
getTLSMinVersionForPoolBootstrap(pool, kubeletConfigs, apiServer) and
createCRIOTLSDropinFile) out of the ContainerRuntimeConfig-specific block and
into the per-pool rendering path where configFileList is assembled for every
pool (ensuring you still append to the same configFileList and avoid duplicating
entries if a user ContainerRuntimeConfig already provided the same drop-in).
In `@test/e2e-2of2/ctrcfg_test.go`:
- Around line 473-487: The current wait.PollImmediate call only checks for the
TLS drop-in file via fileExists (tlsDropinFileName) so it can return early if
the file is present but still contains the old value; change the predicate
inside wait.PollImmediate to read and validate the tls_min_version value using
getValueFromCrioConfig (with tlsDropinPath and the regex `tls_min_version =
"(\S+)"`) and only return true when that value equals "VersionTLS13", while
still treating transient errors (from getValueFromCrioConfig or exec) as
non-fatal by returning (false, nil) to keep retrying. Ensure the require.Nil
error message remains if the poll times out.
---
Duplicate comments:
In
`@pkg/controller/container-runtime-config/container_runtime_config_bootstrap.go`:
- Around line 108-130: The helper getTLSMinVersionFromKubeletConfigsBootstrap
currently skips KubeletConfigs with TLSSecurityProfile==nil which prevents a
newer config that clears TLS settings from being chosen; change the selection
logic to consider all matching kubeletConfigs (do not continue on
TLSSecurityProfile==nil) so the loop can pick the newest matching kc for the
pool, then after newest is chosen if newest.Spec.TLSSecurityProfile == nil
return "" else call
ctrlcommon.GetSecurityProfileCiphers(newest.Spec.TLSSecurityProfile) to obtain
and return the minVersion.
In `@test/e2e-2of2/ctrcfg_test.go`:
- Around line 378-381: The cleanup deletes for cs.MachineConfigs(),
cs.ContainerRuntimeConfigs(), and cs.KubeletConfigs() are asynchronous and may
still be terminating when the test later calls create, causing AlreadyExists
races; update the test to either use per-run unique resource names (mcName,
ctrcfgName, kcName) or, after calling Delete(...), poll the API (e.g., using a
short retry loop) until Get(...) returns NotFound for each resource before
proceeding to the creates on the later lines (385, 407, 467), ensuring the
objects are fully removed prior to recreation.
- Around line 506-509: The PollImmediate closures fail fast because
getValueFromCrioConfig can return an error on transient node exec failures;
modify the anonymous functions used in wait.PollImmediate (the one calling
getValueFromCrioConfig for tls_min_version and the similar block at the 523-535
range) to treat execution errors as retryable by returning (false, nil) when
getValueFromCrioConfig returns an error, and only return (true, nil) when the
desired value is observed (or return the error only for terminal/non-transient
failures), so the wait loop can continue across reboot windows.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 953fc993-dea6-4101-ae5c-67ab524f9977
📒 Files selected for processing (10)
cmd/machine-config-controller/start.gopkg/controller/bootstrap/bootstrap.gopkg/controller/container-runtime-config/container_runtime_config_bootstrap.gopkg/controller/container-runtime-config/container_runtime_config_bootstrap_test.gopkg/controller/container-runtime-config/container_runtime_config_controller.gopkg/controller/container-runtime-config/container_runtime_config_controller_test.gopkg/controller/container-runtime-config/helpers.gopkg/controller/container-runtime-config/helpers_test.gotest/e2e-2of2/ctrcfg_test.gotest/e2e-bootstrap/bootstrap_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- pkg/controller/container-runtime-config/container_runtime_config_bootstrap_test.go
- test/e2e-bootstrap/bootstrap_test.go
- cmd/machine-config-controller/start.go
- pkg/controller/container-runtime-config/helpers.go
…Config controller Signed-off-by: Aditi <asahay@redhat.com>
|
/payload-job periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of3 |
|
@asahay19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c9c38cf0-26b8-11f1-8a86-439171d498a4-0 |
|
/payload-job periodic-ci-openshift-release-master-nightly-4.22-e2e-aws-ovn-upgrade-fips |
|
@ngopalak-redhat: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-upgrade-fips |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/75472520-26bc-11f1-8107-c74e29aafe8f-0 |
Hey @bitoku . All the jobs got passed. Even the jobs triggered by me also got passed. |
|
/retest |
|
@asahay19: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest |
|
/hold cancel Latest 4.22 build contains crio 1.35.
|
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: asahay19, bitoku, ngopalak-redhat The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/verified by ci |
|
@asahay19: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@asahay19: This pull request references OCPNODE-4106 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@asahay19: This pull request references OCPNODE-4106 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@asahay19: This pull request references OCPNODE-4106 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/hold |
What this PR does
This PR is part of the OCPSTRAT-2748 and epic OCPNODE-3916 initiative to enforce consistent TLS security policies across all OpenShift components. Today, when a cluster administrator sets a TLS security profile (via KubeletConfig or the cluster APIServer), the kubelet does it but CRI-O does not — leaving CRI-O's streaming and metrics servers with default TLS settings that may not meet the organization's security requirements.
The CRI-O side of this work has already been done and PR has been merged : cri-o/cri-o#9723 , which added support for reading tls_min_version and tls_cipher_suites from CRI-O's drop-in configuration. Both PRs target OpenShift 4.22.
Right now, when an admin sets a TLS profile in KubeletConfig, the kubelet picks it up but CRI-O doesn't. This PR fixes that by generating a CRI-O drop-in file (/etc/crio/crio.conf.d/01-ctrcfg-tls) with the matching TLS version and ciphers. If no KubeletConfig sets TLS, it falls back to the cluster APIServer profile (default: Intermediate/TLS 1.2).
Reference: OCPNODE-4106
Background:
The ContainerRuntimeConfig controller now has a dedicated tlsQueue and syncTLSConfig handler, following the same pattern as kubelet_config_nodes.go. It watches KubeletConfig, APIServer, and MCP events, and for each pool generates a 99--generated-containerruntime-tls MachineConfig with the CRI-O drop-in containing both tls_min_version and tls_cipher_suites .
This runs independently of ContainerRuntimeConfig CRs, so every pool gets TLS config even without any CTRCFGs defined. KubeletConfig takes priority; if none is set, it falls back to the APIServer profile (default: Intermediate/TLS 1.2). When multiple KubeletConfigs target the same pool, the most recently created one (greatertimestamp) wins.
At bootstrap, a separate RunTLSBootstrap function generates TLS MachineConfigs for all pools, called outside the ContainerRuntimeConfig block so TLS is propagated from the start.
The controller writes a TOML drop-in at /etc/crio/crio.conf.d/01-ctrcfg-tls containing the [crio.api] section with tls_min_version and tls_cipher_suites. CRI-O reads this on startup (or reload) and applies the settings to its streaming and metrics servers. For example, an Intermediate profile produces tls_min_version = "VersionTLS12" with the full set of IANA cipher suites, while a Modern profile produces tls_min_version = "VersionTLS13" with only TLS 1.3 ciphers.
Output
Locally tested all the unit test cases are Passing :
--- PASS: TestRunTLSBootstrap (0.00s) --- PASS: TestRunTLSBootstrap/NilKubeletConfigsAndNilAPIServer_FallsBackToIntermediate --- PASS: TestRunTLSBootstrap/KubeletConfigCustomTLS13_OverridesAPIServer --- PASS: TestRunTLSBootstrap/CustomProfileWithCiphers_PropagatesCipherSuites --- PASS: TestRunTLSBootstrap/MCNameAndAnnotations --- PASS: TestCreateCRIOTLSDropinFile (0.00s) --- PASS: TestCreateCRIOTLSDropinFile/TLS_1.2_generates_drop-in --- PASS: TestCreateCRIOTLSDropinFile/TLS_1.3_generates_drop-in --- PASS: TestCreateCRIOTLSDropinFile/Empty_version_returns_error --- PASS: TestCreateCRIOTLSDropinFile/TLS_1.2_with_cipher_suites --- PASS: TestTLSConfigFromKubeletConfigs (0.00s) --- PASS: TestTLSConfigFromKubeletConfigs/No_KubeletConfigs_returns_empty --- PASS: TestTLSConfigFromKubeletConfigs/KubeletConfig_without_TLS_returns_empty --- PASS: TestTLSConfigFromKubeletConfigs/KubeletConfig_with_TLS_1.3_matching_worker_pool --- PASS: TestTLSConfigFromKubeletConfigs/KubeletConfig_with_TLS_targeting_different_pool_returns_empty --- PASS: TestTLSConfigFromKubeletConfigs/KubeletConfig_with_TLS_matching_master_pool --- PASS: TestTLSConfigFromKubeletConfigs/Multiple_matching_KubeletConfigs:_newest_wins --- PASS: TestTLSConfigFromKubeletConfigs/Custom_profile_with_ciphers_returns_ciphers --- PASS: TestTLSConfigFromKubeletConfigs/Intermediate_profile_returns_ciphers --- PASS: TestKubeletConfigTLSPropagationToCRIO (20.30s) --- PASS: TestKubeletConfigTLSPropagationToCRIO/APIServerFallback --- PASS: TestKubeletConfigTLSPropagationToCRIO/CustomProfileTLS12WithCiphers --- PASS: TestKubeletConfigTLSPropagationToCRIO/CustomProfileTLS13 --- PASS: TestKubeletConfigTLSPropagationToCRIO/IntermediateProfile --- PASS: TestKubeletConfigTLSPropagationToCRIO/ModernProfile --- PASS: TestKubeletConfigTLSPropagationToCRIO/RevertOnKubeletConfigDeletione2e test result
asahay@asahay-ubuntu:~/crio-test/machine-config-operator$ go test -v -timeout 30m -run TestKubeletConfigTLSPropagationToCRIO ./test/e2e-2of2/... === RUN TestKubeletConfigTLSPropagationToCRIO ctrcfg_test.go:400: Created empty MCP "tls-api-test" (no matching nodes) ctrcfg_test.go:411: Found dedicated TLS MC "99-tls-api-test-generated-containerruntime-tls" === RUN TestKubeletConfigTLSPropagationToCRIO/APIServerFallback ctrcfg_test.go:506: Verified: APIServer fallback includes cipher suites from Intermediate profile === RUN TestKubeletConfigTLSPropagationToCRIO/CustomProfileTLS12WithCiphers ctrcfg_test.go:529: Verified: Custom profile with TLS 1.2 + specific ciphers propagated to CRI-O === RUN TestKubeletConfigTLSPropagationToCRIO/CustomProfileTLS13 ctrcfg_test.go:542: Verified: Custom profile with TLS 1.3 propagated to CRI-O === RUN TestKubeletConfigTLSPropagationToCRIO/IntermediateProfile ctrcfg_test.go:554: Verified: Intermediate profile propagated TLS 1.2 + cipher suites to CRI-O === RUN TestKubeletConfigTLSPropagationToCRIO/ModernProfile ctrcfg_test.go:562: Verified: Modern profile propagated TLS 1.3 to CRI-O === RUN TestKubeletConfigTLSPropagationToCRIO/RevertOnKubeletConfigDeletion ctrcfg_test.go:573: Verified: Reverted to APIServer fallback with cipher suites after KubeletConfig deletion --- PASS: TestKubeletConfigTLSPropagationToCRIO (20.34s) --- PASS: TestKubeletConfigTLSPropagationToCRIO/APIServerFallback (0.02s) --- PASS: TestKubeletConfigTLSPropagationToCRIO/CustomProfileTLS12WithCiphers (0.03s) --- PASS: TestKubeletConfigTLSPropagationToCRIO/CustomProfileTLS13 (2.04s) --- PASS: TestKubeletConfigTLSPropagationToCRIO/IntermediateProfile (2.05s) --- PASS: TestKubeletConfigTLSPropagationToCRIO/ModernProfile (2.03s) --- PASS: TestKubeletConfigTLSPropagationToCRIO/RevertOnKubeletConfigDeletion (2.03s) PASS ok github.com/openshift/machine-config-operator/test/e2e-2of2 20.415sPre-Merge Testing : https://issues.redhat.com/browse/OCPNODE-4113
Upgrade Testing : https://redhat.atlassian.net/browse/OCPNODE-4220
Functionality Testing : https://redhat.atlassian.net/browse/OCPNODE-4222
cc @ngopalak-redhat @bitoku @haircommander