Use ManagerIdentity API instead of LastModifierIdentity + ignore-last-modifier metadata hack#220
Use ManagerIdentity API instead of LastModifierIdentity + ignore-last-modifier metadata hack#220
ManagerIdentity API instead of LastModifierIdentity + ignore-last-modifier metadata hack#220Conversation
…h ManagerIdentity API
Move ClaimManagerIdentity decision into planner.VersionConfig so it can be unit-tested alongside other plan decisions. Add two planner tests covering the claim (ManagerIdentity empty) and no-claim (ManagerIdentity already set) cases. Replace the six LastModifierIdentity integration tests with equivalent ManagerIdentity-based tests: blocked tests set ManagerIdentity to an external identity and assert routing changes are rejected; unblocked tests set then immediately clear ManagerIdentity so the controller can claim it normally. Remove the old helper functions (setUnversionedCurrent, setCurrentAndSetIgnoreModifierMetadata, validateIgnoreLastModifierMetadata) and add setManagerIdentityToOther, setManagerIdentityBlockThenUnblock, and validateManagerIdentity. Rewrite docs/ownership.md to document the ManagerIdentity mechanism and the temporal worker deployment manager-identity set/unset CLI commands. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ManagerIdentity API instead of LastModifierIdentity + ignore-last-modifier metadata hack
Refetch the Deployment immediately before calling Status().Update so the resourceVersion is always current. Previously, the deployment was read at the start of applyDeployment, workers were started (up to 30s), and then the stale object was used for the status update — racing with the controller performing a rolling update in the meantime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| @@ -132,8 +139,8 @@ func setHealthyDeploymentStatus(t *testing.T, ctx context.Context, k8sClient cli | |||
| }, | |||
| }, | |||
| } | |||
| t.Logf("started %d healthy workers, updating deployment status", *deployment.Spec.Replicas) | |||
| if err := k8sClient.Status().Update(ctx, &deployment); err != nil { | |||
| t.Logf("started %d healthy workers, updating deployment status", *fresh.Spec.Replicas) | |||
| if err := k8sClient.Status().Update(ctx, &fresh); err != nil { | |||
There was a problem hiding this comment.
this is unrelated to the PR, fixes a flaky test. Commit message for more info:
Refetch the Deployment immediately before calling Status().Update so the resourceVersion is always current. Previously, the deployment was read at the start of applyDeployment, workers were started (up to 30s), and then the stale object was used for the status update — racing with the controller performing a rolling update in the meantime.
| // ClaimManagerIdentity indicates the controller should call SetManagerIdentity(Self=true) | ||
| // before applying this version config, because ManagerIdentity is currently unset. | ||
| ClaimManagerIdentity bool |
There was a problem hiding this comment.
Instead of a bool field, how about a string field called ManagerIdentity that matches the definition of the WorkerDeployment.ManagerIdentity message field.. I find mixing imperative/instruction-style fields with declarative-style fields in a struct called VersionConfig to be confusing, personally. Better to have logic and conditionals defined in the code that consumes the VersionConfig struct instead of having those conditionals/logic be encoded in bool fields.
In execplan.go, you could have a separate shouldClaimManagerIdentity() function that examines the VersionConfig.ManagerIdentity field:
func (r *TemporalWorkerDeploymentReconciler) shouldClaimManagerIdentity(
vcfg *VersionConfig,
) bool {
return vcfg.ManagerIdentity == ""
}and a little helper function to perform the claim if necessary:
func (r *TemporalWorkerDeploymentReconciler) claimManagerIdentity(
ctx context.Context,
l logr.Logger,
workerDeploy *temporaliov1alpha1.TemporalWorkerDeployment,
deploymentHandler sdkclient.WorkerDeploymentHandle,
vcfg *VersionConfig,
) error {
resp, err := deploymentHandler.SetManagerIdentity(ctx, sdkclient.WorkerDeploymentSetManagerIdentityOptions{
Self: true,
ConflictToken: vcfg.ConflictToken,
})
if err != nil {
l.Error(err, "unable to claim manager identity")
r.Recorder.Eventf(workerDeploy, corev1.EventTypeWarning, ReasonManagerIdentityClaimFailed,
"Failed to claim manager identity: %v", err)
return err
}
l.Info("claimed manager identity", "identity", getControllerIdentity())
// Use the updated conflict token for the subsequent routing config change.
vcfg.ConflictToken = resp.ConflictToken
return nil
}Then updateVersionConfig would be a little easier to follow/read:
func (r *TemporalWorkerDeploymentReconciler) updateVersionConfig(ctx context.Context, l logr.Logger, workerDeploy *temporaliov1alpha1.TemporalWorkerDeployment, deploymentHandler sdkclient.WorkerDeploymentHandle, p *plan) error {
vcfg := p.UpdateVersionConfig
if vcfg == nil {
return nil
}
if r.shouldClaimManagerIdentity(vcfg) {
if err := r.claimManagerIdentity(ctx, l, workerDeploy, vcfg); err != nil {
return fmt.Errorf("unable to claim manager identity: %w", err)
}
}
...There was a problem hiding this comment.
makes sense will do!
What was changed
Replace the
LastModifierIdentity+temporal.io/ignore-last-modifiermetadata mechanism with the Temporal server's first-classManagerIdentityAPI for Worker Deployment ownership.Controller behavior:
ManagerIdentityis empty), it callsSetManagerIdentity(Self=true)to claim ownership before applying the change. The conflict token returned is threaded into the subsequent SetCurrentVersion / SetRampingVersion call.ManagerIdentityis already set to a different identity, the routing change is submitted as-is and the Temporal server rejects it. The controller returns an error and retries on backoff -> no special manual-mode logic needed.ManagerIdentityvalue is surfaced inTemporalWorkerDeploymentStatus.managerIdentity.Planner:
ClaimManagerIdentitybool is now a field onplanner.VersionConfig, keeping the decision unit-testable alongside other plan decisions.Removed: the
LastModifierIdentity-based forced-manual-strategy logic ingenplan.goand theIgnoreLastModifier/RemoveIgnoreLastModifierBuildsplumbing.Why?
The old mechanism was a controller-side workaround: it watched
LastModifierIdentity, forced manual strategy when it saw a non-controller modifier, and required users to set a special metadata key to pass ownership back. This was fragile (race-prone, required reading the metadata contract) and blocked on version-level metadata support.ManagerIdentityis a server-enforced, deployment-level field designed exactly for this use case. It gives a clear, atomic protocol for ownership transfer without any special metadata conventions.Checklist
Closes Use
ManagerIdentityinstead ofLastModifierIdentityto handle changes made by external users #162How was this tested:
ClaimManagerIdentitydecision (claims whenManagerIdentityis empty, does not claim when already set by someone else).*-blocked-by-manager-identitytests verify the controller cannot make routing changes whenManagerIdentityis held by an external identity; three*-unblocked-after-manager-identity-clearedtests verify the controller proceeds normally onceManagerIdentityis cleared.Any docs updates needed?
docs/ownership.mdhas been rewritten to document theManagerIdentitymechanism and thetemporal worker deployment manager-identity set/unsetCLI commands for taking and returning control.