Skip to content

Use ManagerIdentity API instead of LastModifierIdentity + ignore-last-modifier metadata hack#220

Open
carlydf wants to merge 5 commits intomainfrom
manager-identity
Open

Use ManagerIdentity API instead of LastModifierIdentity + ignore-last-modifier metadata hack#220
carlydf wants to merge 5 commits intomainfrom
manager-identity

Conversation

@carlydf
Copy link
Collaborator

@carlydf carlydf commented Mar 8, 2026

What was changed

Replace the LastModifierIdentity + temporal.io/ignore-last-modifier metadata mechanism with the Temporal server's first-class ManagerIdentity API for Worker Deployment ownership.

Controller behavior:

  • When the controller is about to make its first routing change to a Worker Deployment (i.e. ManagerIdentity is empty), it calls SetManagerIdentity(Self=true) to claim ownership before applying the change. The conflict token returned is threaded into the subsequent SetCurrentVersion / SetRampingVersion call.
  • If ManagerIdentity is already set to a different identity, the routing change is submitted as-is and the Temporal server rejects it. The controller returns an error and retries on backoff -> no special manual-mode logic needed.
  • The current ManagerIdentity value is surfaced in TemporalWorkerDeploymentStatus.managerIdentity.

Planner: ClaimManagerIdentity bool is now a field on planner.VersionConfig, keeping the decision unit-testable alongside other plan decisions.

Removed: the LastModifierIdentity-based forced-manual-strategy logic in genplan.go and the IgnoreLastModifier/RemoveIgnoreLastModifierBuilds plumbing.

Why?

The old mechanism was a controller-side workaround: it watched LastModifierIdentity, forced manual strategy when it saw a non-controller modifier, and required users to set a special metadata key to pass ownership back. This was fragile (race-prone, required reading the metadata contract) and blocked on version-level metadata support.

ManagerIdentity is a server-enforced, deployment-level field designed exactly for this use case. It gives a clear, atomic protocol for ownership transfer without any special metadata conventions.

Checklist

  1. Closes Use ManagerIdentity instead of LastModifierIdentity to handle changes made by external users #162

  2. How was this tested:

    • New planner unit tests cover the ClaimManagerIdentity decision (claims when ManagerIdentity is empty, does not claim when already set by someone else).
    • Six integration tests updated: three *-blocked-by-manager-identity tests verify the controller cannot make routing changes when ManagerIdentity is held by an external identity; three *-unblocked-after-manager-identity-cleared tests verify the controller proceeds normally once ManagerIdentity is cleared.
  3. Any docs updates needed?

    • Yes: docs/ownership.md has been rewritten to document the ManagerIdentity mechanism and the temporal worker deployment manager-identity set/unset CLI commands for taking and returning control.

carlydf and others added 3 commits March 6, 2026 19:39
Move ClaimManagerIdentity decision into planner.VersionConfig so it can
be unit-tested alongside other plan decisions. Add two planner tests
covering the claim (ManagerIdentity empty) and no-claim (ManagerIdentity
already set) cases.

Replace the six LastModifierIdentity integration tests with equivalent
ManagerIdentity-based tests: blocked tests set ManagerIdentity to an
external identity and assert routing changes are rejected; unblocked tests
set then immediately clear ManagerIdentity so the controller can claim it
normally. Remove the old helper functions (setUnversionedCurrent,
setCurrentAndSetIgnoreModifierMetadata, validateIgnoreLastModifierMetadata)
and add setManagerIdentityToOther, setManagerIdentityBlockThenUnblock, and
validateManagerIdentity.

Rewrite docs/ownership.md to document the ManagerIdentity mechanism and
the temporal worker deployment manager-identity set/unset CLI commands.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@carlydf carlydf requested review from a team and jlegrone as code owners March 8, 2026 06:13
@carlydf carlydf changed the title Manager identity Use ManagerIdentity API instead of LastModifierIdentity + ignore-last-modifier metadata hack Mar 8, 2026
@carlydf carlydf changed the title Use ManagerIdentity API instead of LastModifierIdentity + ignore-last-modifier metadata hack Use ManagerIdentity API instead of LastModifierIdentity + ignore-last-modifier metadata hack Mar 8, 2026
carlydf and others added 2 commits March 7, 2026 22:49
Refetch the Deployment immediately before calling Status().Update so the
resourceVersion is always current. Previously, the deployment was read at
the start of applyDeployment, workers were started (up to 30s), and then
the stale object was used for the status update — racing with the
controller performing a rolling update in the meantime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment on lines 109 to +143
@@ -132,8 +139,8 @@ func setHealthyDeploymentStatus(t *testing.T, ctx context.Context, k8sClient cli
},
},
}
t.Logf("started %d healthy workers, updating deployment status", *deployment.Spec.Replicas)
if err := k8sClient.Status().Update(ctx, &deployment); err != nil {
t.Logf("started %d healthy workers, updating deployment status", *fresh.Spec.Replicas)
if err := k8sClient.Status().Update(ctx, &fresh); err != nil {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unrelated to the PR, fixes a flaky test. Commit message for more info:

Refetch the Deployment immediately before calling Status().Update so the resourceVersion is always current. Previously, the deployment was read at the start of applyDeployment, workers were started (up to 30s), and then the stale object was used for the status update — racing with the controller performing a rolling update in the meantime.

Comment on lines +50 to +52
// ClaimManagerIdentity indicates the controller should call SetManagerIdentity(Self=true)
// before applying this version config, because ManagerIdentity is currently unset.
ClaimManagerIdentity bool
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a bool field, how about a string field called ManagerIdentity that matches the definition of the WorkerDeployment.ManagerIdentity message field.. I find mixing imperative/instruction-style fields with declarative-style fields in a struct called VersionConfig to be confusing, personally. Better to have logic and conditionals defined in the code that consumes the VersionConfig struct instead of having those conditionals/logic be encoded in bool fields.

In execplan.go, you could have a separate shouldClaimManagerIdentity() function that examines the VersionConfig.ManagerIdentity field:

func (r *TemporalWorkerDeploymentReconciler) shouldClaimManagerIdentity(
	vcfg *VersionConfig,
) bool {
	return vcfg.ManagerIdentity == ""
}

and a little helper function to perform the claim if necessary:

func (r *TemporalWorkerDeploymentReconciler) claimManagerIdentity(
	ctx context.Context,
	l logr.Logger,
	workerDeploy *temporaliov1alpha1.TemporalWorkerDeployment,
	deploymentHandler sdkclient.WorkerDeploymentHandle,
	vcfg *VersionConfig,
) error {
	resp, err := deploymentHandler.SetManagerIdentity(ctx, sdkclient.WorkerDeploymentSetManagerIdentityOptions{
		Self:          true,
		ConflictToken: vcfg.ConflictToken,
	})
	if err != nil {
		l.Error(err, "unable to claim manager identity")
		r.Recorder.Eventf(workerDeploy, corev1.EventTypeWarning, ReasonManagerIdentityClaimFailed,
			"Failed to claim manager identity: %v", err)
		return err
	}
	l.Info("claimed manager identity", "identity", getControllerIdentity())
	// Use the updated conflict token for the subsequent routing config change.
	vcfg.ConflictToken = resp.ConflictToken
	return nil
}

Then updateVersionConfig would be a little easier to follow/read:

func (r *TemporalWorkerDeploymentReconciler) updateVersionConfig(ctx context.Context, l logr.Logger, workerDeploy *temporaliov1alpha1.TemporalWorkerDeployment, deploymentHandler sdkclient.WorkerDeploymentHandle, p *plan) error {
	vcfg := p.UpdateVersionConfig
	if vcfg == nil {
		return nil
	}
	if r.shouldClaimManagerIdentity(vcfg) {
		if err := r.claimManagerIdentity(ctx, l, workerDeploy, vcfg); err != nil {
			return fmt.Errorf("unable to claim manager identity: %w", err)
		}
	}
...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense will do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use ManagerIdentity instead of LastModifierIdentity to handle changes made by external users

2 participants