Skip to content

conductor stop crashes on Windows: OSError [WinError 11] from os.kill(pid, 0) in _is_process_alive #166

@jrob5756

Description

@jrob5756

Summary

conductor stop (and conductor stop --all / conductor stop --port N) crashes on Windows with OSError: [WinError 11] An attempt was made to load a program with an incorrect format whenever there is at least one PID file in ~/.conductor/runs/.

The crash happens before any process is stopped, so the command is effectively unusable on Windows.

Reproduction

  1. On Windows, run any workflow with conductor run <workflow.yaml> --web-bg so that a PID file is written to ~/.conductor/runs/.
  2. Run conductor stop --all (or conductor stop / conductor stop --port <N>).

Actual output

> conductor stop --all
...
  File "...\conductor\cli\pid.py", line 104, in read_pid_files
    if _is_process_alive(pid):
  File "...\conductor\cli\pid.py", line 173, in _is_process_alive
    os.kill(pid, 0)
OSError: [WinError 11] An attempt was made to load a program with an incorrect format

Expected output

The command should list / stop background workflows without crashing, the same way it does on macOS / Linux.

Root cause

src/conductor/cli/pid.py:_is_process_alive uses the Unix idiom os.kill(pid, 0) to test whether a PID is still alive:

def _is_process_alive(pid: int) -> bool:
    try:
        os.kill(pid, 0)
    except ProcessLookupError:
        return False
    except PermissionError:
        return True
    return True

On Windows, signal 0 is not a no-op probe. Per the CPython docs:

Windows: The signal.CTRL_C_EVENT and signal.CTRL_BREAK_EVENT signals are special signals which can only be sent to console processes... Any other value for sig will cause the process to be unconditionally killed by the TerminateProcess API.

So os.kill(pid, 0) on Windows tries to open the process and call TerminateProcess. Depending on the process's bitness / image type / access rights, this can raise a variety of OSError subclasses that don't match the ProcessLookupError / PermissionError branches the code expects — for example WinError 11 (ERROR_BAD_FORMAT) as seen here. The exception escapes and crashes the CLI.

It would also be unsafe even if it didn't raise: a "successful" os.kill(pid, 0) on Windows would actually terminate the target process with exit code 0.

Suggested fix

Use a Windows-specific existence check instead of os.kill. A dependency-free option is to call OpenProcess via ctypes and check whether the handle is valid (and optionally whether the process has already exited). Sketch:

import sys

if sys.platform == "win32":
    import ctypes
    from ctypes import wintypes

    _PROCESS_QUERY_LIMITED_INFORMATION = 0x1000
    _STILL_ACTIVE = 259

    def _is_process_alive(pid: int) -> bool:
        kernel32 = ctypes.windll.kernel32
        handle = kernel32.OpenProcess(_PROCESS_QUERY_LIMITED_INFORMATION, False, pid)
        if not handle:
            return False
        try:
            exit_code = wintypes.DWORD()
            if kernel32.GetExitCodeProcess(handle, ctypes.byref(exit_code)):
                return exit_code.value == _STILL_ACTIVE
            return False
        finally:
            kernel32.CloseHandle(handle)
else:
    def _is_process_alive(pid: int) -> bool:
        try:
            os.kill(pid, 0)
        except ProcessLookupError:
            return False
        except PermissionError:
            return True
        return True

Notes:

  • Keep the existing Unix path for non-Windows platforms.
  • Consider also catching OSError (in addition to ProcessLookupError / PermissionError) on Unix and treating an unexpected error as "unknown — assume alive" rather than crashing, so a future surprise on any platform doesn't take down conductor stop.
  • A test that monkeypatches os.kill to raise a generic OSError would have caught this regression.

Environment

  • OS: Windows (path C:\Users\jasonrobert\AppData\Roaming\uv\tools\conductor-cli\Lib\site-packages\conductor\cli\pid.py, WindowsPath in locals)
  • Install method: uv tool install conductor-cli
  • PID file present in ~/.conductor/runs/ (workflow implement-20260506-122050-2105ac7d-63638.pid, target PID 266480, port 63638)

Full traceback

> conductor stop --all
╭───── Traceback (most recent call last) ─────╮
│ ...\conductor\cli\app.py:1073 in stop      │
│ ❱ 1073 │   running = read_pid_files()       │
│                                              │
│ ...\conductor\cli\pid.py:104 in read_pid_files │
│ ❱ 104 │   │   if _is_process_alive(pid):    │
│                                              │
│ ...\conductor\cli\pid.py:173 in _is_process_alive │
│ ❱ 173 │   │   os.kill(pid, 0)               │
╰─────────────────────────────────────────────╯
OSError: [WinError 11] An attempt was made to load a program with an incorrect format

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions