Fix race condition on SandboxService.waiters#1289
Merged
JaewonHur merged 2 commits intoapple:mainfrom Mar 5, 2026
Merged
Conversation
dcantah
approved these changes
Mar 4, 2026
dcantah
reviewed
Mar 4, 2026
| cc.resume(returning: ExitStatus(exitCode: -1)) | ||
| let (added, exitCode) = self.addWaiter(id: id, cont: cc) | ||
| if !added { | ||
| cc.resume(returning: ExitStatus(exitCode: exitCode ?? -1)) |
dcantah
approved these changes
Mar 5, 2026
saehejkang
pushed a commit
to saehejkang/container
that referenced
this pull request
Mar 6, 2026
This PR fixes apple#1277. `SandboxService.waiters` had a consistency issue (not exactly race). `SandboxService.wait` XPC can be executed on arbitrary `id`, and it will hang forever if no other handler resumes it. Without knowing this internal, the high level entity can run into this issue, and deadlock. This PR simplifies the mental model: **`SandboxService.waiters[id]: ExitWaiter(continuations, exitCode)` can only be in three states: i) non-existing, ii) existing with nil `exitCode`, and iii) existing with concrete `exitCode`.** **If it is non-existing, no handler has been registered to resume it later. If existing with nil `exitCode`, It is guaranteed the registered `continuations` will be resumed later with a concrete `exitCode`. Finally, if already a concrete `exitCode`, a handler has been registered, and already resumed (with that `exitCode`).** Thus, `SandboxService.wait` should return immediately if `waiters[id]` is non-existing or existing with a concrete `exitCode` (as no handler will resume it later). It should only block when `waiters[id]` is existing with nil `exitCode` as it is guaranteed to be resumed later. By doing so, we can guarantee there is no deadlock at all. For that this PR does followings: 1. Introduce `ExitMonitor` class to updates `continuations` and `exitCode` all together atomically. Initially, `state` variable saved the `exitCode`, but it cannot be tied with `continuations` as they are protected by different primitives (i.e., lock and actor). 2. Gather `waiters` related operations into a single actor method, guaranteeing those are performed atomically under actor protection---i.e., we actually don't need Mutex here. 3. Ensure initialized `waiters` are released (i.e., resumed) later (under any possible circumstances). 4. Move `process.wait` after `process.start` in `io.handleProcess` to run `SandboxService.wait` only after the `waiters[id]` is initialized. By doing fourth step, we can guarantee `SandboxService.wait` can meet only one of two following `ExitMonitor` state: i) existing with nil `exitCode`, or ii) existing with concrete `exitCode` (in case the process exited too early). In both cases, `exitCode` is preserved and returned. ## Type of Change - [X] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation update ## Motivation and Context [Why is this change needed?] ## Testing - [X] Tested locally - [ ] Added/updated tests - [ ] Added/updated docs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes #1277.
SandboxService.waitershad a consistency issue (not exactly race).SandboxService.waitXPC can be executed on arbitraryid, and it will hang forever if no other handler resumes it. Without knowing this internal, the high level entity can run into this issue, and deadlock.This PR simplifies the mental model:
SandboxService.waiters[id]: ExitWaiter(continuations, exitCode)can only be in three states: i) non-existing, ii) existing with nilexitCode, and iii) existing with concreteexitCode.If it is non-existing, no handler has been registered to resume it later. If existing with nil
exitCode, It is guaranteed the registeredcontinuationswill be resumed later with a concreteexitCode. Finally, if already a concreteexitCode, a handler has been registered, and already resumed (with thatexitCode).Thus,
SandboxService.waitshould return immediately ifwaiters[id]is non-existing or existing with a concreteexitCode(as no handler will resume it later). It should only block whenwaiters[id]is existing with nilexitCodeas it is guaranteed to be resumed later. By doing so, we can guarantee there is no deadlock at all.For that this PR does followings:
ExitMonitorclass to updatescontinuationsandexitCodeall together atomically. Initially,statevariable saved theexitCode, but it cannot be tied withcontinuationsas they are protected by different primitives (i.e., lock and actor).waitersrelated operations into a single actor method, guaranteeing those are performed atomically under actor protection---i.e., we actually don't need Mutex here.waitersare released (i.e., resumed) later (under any possible circumstances).process.waitafterprocess.startinio.handleProcessto runSandboxService.waitonly after thewaiters[id]is initialized.By doing fourth step, we can guarantee
SandboxService.waitcan meet only one of two followingExitMonitorstate: i) existing with nilexitCode, or ii) existing with concreteexitCode(in case the process exited too early). In both cases,exitCodeis preserved and returned.Type of Change
Motivation and Context
[Why is this change needed?]
Testing