feat: add configurable deploy timeout via --timeout flag and azure.yaml#7045
feat: add configurable deploy timeout via --timeout flag and azure.yaml#7045
Conversation
Add a --timeout flag to azd deploy and a deployTimeout field in azure.yaml service configuration. Users can now control how long azd waits for a deployment to complete, addressing the 20-minute hardcoded default that was too long for simple pipeline deployments. Timeout resolution order: CLI flag > azure.yaml service config > default (1200s). Invalid values (zero or negative) return a clear error. Fixes #6555 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a user-configurable deployment timeout to azd deploy, allowing CI/pipelines to fail faster when Azure long-running deployments hang, using a CLI flag and per-service azure.yaml configuration.
Changes:
- Adds
--timeouttoazd deploywith precedence: flag >azure.yaml(deployTimeout) > default (1200s). - Wraps each service deploy operation in
context.WithTimeoutso Azure SDK polling honors the deadline. - Extends project config/schema/tests and updates CLI usage/completions snapshots.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/internal/cmd/deploy.go | Adds --timeout, resolves precedence, and applies context.WithTimeout around deploy. |
| cli/azd/internal/cmd/deploy_test.go | Adds tests for flag parsing, timeout resolution, and ensuring deploy uses a deadline. |
| cli/azd/pkg/project/service_config.go | Adds DeployTimeout *int to service config model. |
| cli/azd/pkg/project/service_config_test.go | Adds YAML parsing tests for deployTimeout. |
| cli/azd/pkg/project/project_config_test.go | Adds project-level parsing test for deployTimeout. |
| schemas/v1.0/azure.yaml.json | Adds deployTimeout schema property for services. |
| cli/azd/cmd/testdata/TestFigSpec.ts | Updates Fig completion spec snapshot to include --timeout. |
| cli/azd/cmd/testdata/TestUsage-azd-deploy.snap | Updates deploy usage snapshot to include --timeout. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Addresses review feedback from Ratt: - Wrap context.DeadlineExceeded with user-friendly timeout message - Add minimum: 1 to deployTimeout JSON schema Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
How can we communicate to folks that the timeout is only to stop azd but not the actual deployment? For AppService, for example, azd can be stopped, during zip deployment, before sending the zip file and starting the service deployment. That would be a valid cancellation in both sides. |
What I am going to change here is a global timeout setting for deployments instead of per service |
Remove deployTimeout from azure.yaml ServiceConfig and schema per owner direction. The --timeout flag on azd deploy is the only way to configure the deploy timeout. This keeps operational concerns out of project config. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Timeout error now explains azd stopped waiting but deployment may continue server-side, with Portal check guidance (Victor's feedback) - Defer cancel() immediately after WithTimeout (standard Go pattern) - Remove reflection-based t.Skip — access flags.Timeout directly - Fix flaky deadline assertion by capturing start time before Run() Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolution order: CLI --timeout flag > AZD_DEPLOY_TIMEOUT env var > default (1200s). This lets CI/CD pipelines configure deploy timeout without modifying command args. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove .squad/agents/ files that were accidentally committed - Break long lines in deploy.go warning message and deploy_test.go assertions to stay within 125-char lll limit - All review threads from automated reviewer are resolved Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Regarding hemarina's feedback about per-service vs total timeout: the current implementation applies |
weikanglim
left a comment
There was a problem hiding this comment.
Overall looks good -- have some comments
- Remove .squad/ files from PR (hemarina) - Defer context cancel immediately after WithTimeout (Copilot) - Move timeout resolution before service loop (weikanglim) - Check deployCtx.Err() directly for unambiguous timeout detection (weikanglim) - Include --timeout/AZD_DEPLOY_TIMEOUT/azure.yaml in timeout error msg (weikanglim) - Fix flaky test timing by capturing start time before Run() (Copilot) - Remove reflection/t.Skip from tests, use direct field access (Copilot) - Add minimum:1 to deployTimeout in JSON schema (Copilot) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
wbreza
left a comment
There was a problem hiding this comment.
📋 Code Review — PR #7045
Good work on adding configurable deploy timeouts — this addresses a real pain point. The implementation is clean and the test coverage is thorough.
I did a cross-command analysis of timeout patterns across the codebase. This would be the first command with a timeout flag — no other command (provision, package, build, restore, up) has one. All Azure SDK PollUntilDone calls currently use nil options (no timeout). This is a good foundation but raises questions about cohesion across commands.
✅ What Looks Good
context.WithTimeoutpropagation throughPollUntilDoneis elegant — no custom polling needed- Timeout error message is excellent: tells users the deploy may still be running in Azure + how to increase
- 15 table-driven tests with good coverage of precedence, validation, and integration
- Schema updates in both v1.0 and alpha
Summary
| Priority | Count |
|---|---|
| High | 1 |
| Medium | 2 |
| Total | 3 |
Overall Assessment: Comment — the undocumented env var needs attention, and the naming/scope design deserves team input.
| return time.Duration(da.flags.Timeout) * time.Second, nil | ||
| } | ||
|
|
||
| if envVal, ok := os.LookupEnv("AZD_DEPLOY_TIMEOUT"); ok { |
There was a problem hiding this comment.
[High] AZD_DEPLOY_TIMEOUT env var is undocumented
This PR introduces AZD_DEPLOY_TIMEOUT as the middle tier of the resolution precedence (flag > env var > azure.yaml > default), but it's not documented anywhere. Per the AGENTS.md guideline added in #7039, any new os.LookupEnv call should be documented in cli/azd/docs/environment-variables.md.
suggestion if envVal, ok := os.LookupEnv("AZD_DEPLOY_TIMEOUT"); ok { // NOTE: Document this env var in cli/azd/docs/environment-variables.md
"@
},
@{
path = "cli/azd/internal/cmd/deploy.go"
line = 451
side = "RIGHT"
body = @"
[Medium] Design: deployTimeout naming vs. generic timeout
I did a cross-command timeout analysis: no other azd command has a timeout mechanism today. All PollUntilDone calls across provision, container apps, and ARM deployments use PollUntilDone(ctx, nil) with no timeout.
This means deployTimeout sets a precedent for how we name service-level timeouts. A few options worth weighing as a team:
deployTimeout(current) — Specific and clear. But if we later add timeouts for restore/build/package, we'd getrestoreTimeout,buildTimeout, etc. — field proliferation.timeout— Generic, applies to the longest-running service operation (deploy). If we add timeouts for other service commands, this becomes a shared default that can be overridden per-command.- Keep
deployTimeoutand accept that provision timeouts would live underinfra:(different scope entirely).
I could live with deployTimeout if the team prefers specificity, but I'd lean toward a more generic timeout at the service level that defaults to applying to deploy (since that's the primary long-running operation). Worth a quick team discussion.
There was a problem hiding this comment.
I'm not personally a huge fan of deployTimeout either. I could see us avoid introducing the azure.yaml schema change for now just to make things easier to land.
The azure.yaml schema isn't making this easy to land. I could imagine that each service has a separate section for deploy, package parameters
services:
api:
deploy:
timeout: 30000And this would be very nice to add different configuration that customizes the command.
There was a problem hiding this comment.
Agree — will document AZD_DEPLOY_TIMEOUT in cli/azd/docs/environment-variables.md. Good catch.
On the naming discussion with @weikanglim's suggestion of a nested structure:
services:
api:
deploy:
timeout: 300I like this approach — it's more extensible for future per-command config (package, build, etc.) without field proliferation. However, it's a larger schema change. For this PR, I'm inclined to:
- Keep
--timeoutflag as-is (it's the CLI flag, straightforward) - Drop the
deployTimeoutazure.yaml field for now to unblock the PR - Follow up with the nested
deploy: { timeout: ... }schema in a separate PR once we align on the pattern
This way we land the immediate value (flag + env var) without committing to a schema pattern we may want to revise. Thoughts?
| deployTimeout := deployTimeouts[svc.Name] | ||
| var deployResult *project.ServiceDeployResult | ||
| var deployCtx context.Context | ||
| func() { |
There was a problem hiding this comment.
[Medium] Anonymous function for context scoping is unusual
This wraps the deploy in an anonymous function solely to scope defer deployCancel(), but deployCtx and err escape the function and are used afterward. While technically correct (Go preserves DeadlineExceeded after cancel), this is fragile and non-idiomatic.
Consider simplifying — defer in the loop is fine here since the loop body returns on any error:
deployTimeout := deployTimeouts[svc.Name]
deployCtx, deployCancel := context.WithTimeout(ctx, deployTimeout)
defer deployCancel()
deployResult, err := async.RunWithProgress(
func(deployProgress project.ServiceProgress) { ... },
func(progress *async.Progress[project.ServiceProgress]) (*project.ServiceDeployResult, error) {
return da.serviceManager.Deploy(deployCtx, svc, serviceContext, progress)
},
)There was a problem hiding this comment.
Good call — the anonymous function wrapper is awkward. Will simplify to the pattern you suggested:
deployTimeout := deployTimeouts[svc.Name]
deployCtx, deployCancel := context.WithTimeout(ctx, deployTimeout)
defer deployCancel()Since we return on error inside the loop body, defer in the loop works correctly here and is much cleaner.
Problem
azd deployhas no user-configurable timeout. The Azure SDK default timeout is effectively unbounded for App Service and Container Apps deployments, making pipeline feedback loops slow when deployments hang (#6555).Solution
Add a
--timeoutflag toazd deployand adeployTimeoutfield inazure.yamlservice configuration:Timeout resolution order: CLI flag > azure.yaml service config > default (1200s / 20 min)
The timeout wraps the deploy context with
context.WithTimeout, which propagates through all Azure SDKPollUntilDonecalls naturally.Changes
cli/azd/internal/cmd/deploy.go--timeoutflag,resolveDeployTimeout(), apply viacontext.WithTimeoutcli/azd/internal/cmd/deploy_test.gocli/azd/pkg/project/service_config.goDeployTimeout *intfieldcli/azd/pkg/project/service_config_test.gocli/azd/pkg/project/project_config_test.goschemas/v1.0/azure.yaml.jsondeployTimeoutinteger propertycli/azd/cmd/testdata/TestFigSpec.tscli/azd/cmd/testdata/TestUsage-azd-deploy.snapTesting
go build ./...✅Fixes #6555