feat: add CLI shell and exec commands for pod terminal access by V2arK · Pull Request #124 · CentML/centml-python-client

V2arK · 2026-03-12T18:11:10Z

Summary

Add centml cluster shell <id> for interactive terminal sessions (like docker exec -it)
Add centml cluster exec <id> -- <command> for running commands and returning output (like ssh host "cmd")
Both connect via WebSocket through the Platform API terminal proxy, matching the same protocol used by the Web UI TerminalView
Add get_status_v3() to SDK client for pod discovery
Add websockets>=13.0 dependency

Summary by CodeRabbit

New Features
- Added shell and exec cluster subcommands for interactive and non-interactive pod terminal access.
- Added get_status_v3() API method for deployment status retrieval.
Chores
- Added runtime dependencies: websockets and pyte.
Tests
- Added comprehensive tests for CLI shell commands, SDK shell session/renderer, and test configuration behavior.

tests/test_shell.py

centml/cli/shell.py

centml/sdk/shell/session.py

anandj91 · 2026-03-25T21:14:57Z

@coderabbitai review

coderabbitai · 2026-03-25T21:15:05Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

anandj91 · 2026-03-25T21:21:21Z

centml/cli/shell.py

+@click.option("--pod", default=None, help="Specify a pod name")
+@click.option("--shell", "shell_type", default=None, type=click.Choice(["bash", "sh", "zsh"]), help="Shell type")
+@click.option(
+    "--first-pod", is_flag=True, default=False, help="Auto-select the first running pod (skip interactive selection)"


I don't think we need this right? if --pod is not provided, then we default to first pod

If there are multiple replicas, then we interactively ask the user for a pod. --first-pod ensures that we apply the command to the first pod

centml/cli/shell.py

anandj91 · 2026-03-25T21:24:44Z

@coderabbitai full review

coderabbitai · 2026-03-25T21:24:50Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-03-25T21:25:38Z

Walkthrough

Adds WebSocket-based interactive and non-interactive shell commands to the CLI, SDK session utilities and exceptions, a v3 status API method, new runtime deps, and comprehensive tests for pod resolution, WS sessions, and terminal rendering.

Changes

Cohort / File(s)	Summary
CLI Shell Commands `centml/cli/main.py`, `centml/cli/shell.py`	Register `shell` and `exec` subcommands on the `ccluster` Click group. Implement pod resolution (explicit, first, interactive), token retrieval, WS URL construction, and invocation of interactive or exec async sessions.
SDK Shell Core `centml/sdk/shell/__init__.py`, `centml/sdk/shell/exceptions.py`, `centml/sdk/shell/session.py`	Introduce shell package: exception hierarchy (`ShellError`, `NoPodAvailableError`, `PodNotFoundError`), WS URL builder, running-pod enumeration, interactive and exec session implementations with marker-based output extraction and pyte-based ANSI handling, IO forwarding, terminal sizing, and signal handling.
SDK API Extensions `centml/sdk/api.py`	Add `CentMLClient.get_status_v3(deployment_id)` forwarding to the V3 status endpoint.
Dependencies `requirements.txt`	Add `websockets>=13.0` and `pyte>=0.8.0`.
Test Configuration `tests/conftest.py`	Add pytest `pytest_configure` hook and PyTorch test file allowlist to skip selected files during `--sanity` runs.
CLI Shell Tests `tests/test_cli_shell.py`	New tests for CLI shell and exec commands covering TTY checks, pod resolution (explicit/first/interactive/single/none), option forwarding, and command passthrough via `--`.
SDK Shell Tests `tests/test_sdk_shell_session.py`, `tests/test_sdk_shell_renderer.py`	New tests for WS URL construction, exec session message sequencing and marker-based parsing/exit-code extraction, forward_io termination behavior, interactive session terminal restoration, and ANSI-stripping renderer behavior.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as "CLI Shell"
    participant API as "CentML API"
    participant WS as "WebSocket / Pod"

    User->>CLI: Invoke shell/exec command
    CLI->>API: Query deployment status / running pods
    API-->>CLI: Pod list
    alt Single pod or --first-pod
        CLI->>CLI: Auto-select pod
    else Multiple pods
        CLI->>User: Prompt for pod
        User-->>CLI: Select pod
    end
    CLI->>API: Request auth token
    API-->>CLI: Token
    CLI->>CLI: Build WS URL
    CLI->>WS: Connect with token
    WS-->>CLI: Connection established
    alt Interactive
        CLI->>WS: Resize PTY
        rect rgba(100, 150, 200, 0.5)
            loop Bidirectional I/O
                User->>CLI: stdin
                CLI->>WS: send stdin frames
                WS->>CLI: stdout/stderr frames
                CLI->>User: display output
            end
        end
    else Exec (non-interactive)
        CLI->>WS: Resize + send wrapped command (with markers)
        rect rgba(100, 150, 200, 0.5)
            WS->>CLI: Output including BEGIN/END markers
            CLI->>CLI: Parse output, extract exit code
        end
        CLI->>User: show output and exit code
    end
    CLI->>WS: Close connection
    WS-->>CLI: Closed
    CLI->>User: Return exit code

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐇 I nibble at code, whiskers all a-quiver,
WebSockets hum as pods deliver,
Markers blink, terminals sing,
Async hops make output spring,
A tiny rabbit cheers: shells now shiver!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 30.77% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main additions: CLI shell and exec commands for pod terminal access, which aligns with the core changes introducing interactive and non-interactive terminal functionality.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch honglin/shell-forwarding-clean

⚔️ Resolve merge conflicts

Resolve merge conflict in branch honglin/shell-forwarding-clean

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

tests/test_sdk_shell_session.py (1)
315-323: Consider explicitly managing the background shutdown task.

The fire-and-forget task pattern here works, but it's implicit. While Python 3.10+ guarantees the task won't be collected, explicitly retaining a reference and cleaning it up in a finally block (or using a context manager) would make the intent clearer and reduce the risk of subtle issues if the code evolves.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_sdk_shell_session.py` around lines 315 - 323, The test starts a
fire-and-forget task via asyncio.create_task(_set_shutdown()) without keeping a
reference; change the _run coroutine to store the task (e.g., shutdown_task =
asyncio.create_task(_set_shutdown())), then wrap the call to forward_io(ws, [80,
24], shutdown) in a try/finally where in finally you cancel the shutdown_task
(if still running) and await it to ensure cleanup; reference symbols to edit:
_run, _set_shutdown, shutdown, forward_io, and the asyncio.create_task call.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@centml/cli/shell.py`:
- Around line 41-56: The _connect_args function can block by calling _select_pod
(which uses click.prompt) when multiple pods exist; modify _connect_args to
avoid prompting when stdin is not a TTY: import sys and before calling
_select_pod check sys.stdin.isatty(), and if it's False and neither pod nor
first_pod is provided raise click.ClickException instructing the caller to pass
--pod or --first-pod; keep the existing behavior (call _select_pod) only when
sys.stdin.isatty() is True. Ensure this logic is applied inside the
_connect_args function surrounding the branch that currently sets pod_name =
_select_pod(running_pods, deployment_id).
- Around line 89-91: The exec_cmd function currently concatenates the
Click-provided command tuple with " ".join(command), which loses
quoting/escaping; replace that with shlex.join(command) and ensure shlex is
imported so exec_cmd calls asyncio.run(exec_session(ws_url, token,
shlex.join(command))) to preserve argument boundaries for POSIX shells (refer to
exec_cmd and exec_session symbols).

In `@centml/sdk/shell/session.py`:
- Around line 226-229: The code currently initializes exit_code = 0 and silently
falls through on ConnectionClosed, causing a false "success" when END_MARKER
wasn't received; update the ConnectionClosed handling in
centml/sdk/shell/session.py to treat a socket close before detecting END_MARKER
as a transport failure by either setting exit_code to a non-zero value (e.g., 1)
or raising an exception (e.g., RuntimeError("socket closed before END_MARKER")),
and ensure this logic checks the buffer/is_done/END_MARKER state so you only
mark failure when the trailer was not parsed; apply the same change to the
second occurrence referenced (around lines 258-260) so both code paths return
non-zero or raise on missing END_MARKER.
- Around line 11-12: The code calls websockets.connect(...) with
additional_headers (in centml/sdk/shell/session.py at the two call sites that
currently use websockets.connect around lines 162 and 208) which is only
supported by the new asyncio client; change the import to use the asyncio client
explicitly by importing connect from websockets.asyncio.client and replace
websockets.connect(...) calls with connect(...), or alternatively raise the
minimum requirement to websockets>=14.0 in requirements so the existing import
"import websockets" remains valid—pick one approach and apply it consistently to
both call sites.

In `@tests/test_cli_shell.py`:
- Around line 43-68: The test helper _patch_deps currently only patches
asyncio.run, causing real coroutine objects from interactive_session and
exec_session to be created (leading to un-awaited coroutine warnings and
preventing assertions); update _patch_deps to also patch
centml.cli.shell.interactive_session and centml.cli.shell.exec_session (or have
the patched asyncio.run explicitly await/close a fake coroutine) so tests can
inspect calls and avoid warnings; ensure the patched names (interactive_session,
exec_session) return a simple mock/coroutine that records the URL, shell type,
and command used by shell() and exec_cmd() for assertions.

---

Nitpick comments:
In `@tests/test_sdk_shell_session.py`:
- Around line 315-323: The test starts a fire-and-forget task via
asyncio.create_task(_set_shutdown()) without keeping a reference; change the
_run coroutine to store the task (e.g., shutdown_task =
asyncio.create_task(_set_shutdown())), then wrap the call to forward_io(ws, [80,
24], shutdown) in a try/finally where in finally you cancel the shutdown_task
(if still running) and await it to ensure cleanup; reference symbols to edit:
_run, _set_shutdown, shutdown, forward_io, and the asyncio.create_task call.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1295c58d-f50f-4796-812b-7dac84fb274d

📥 Commits

Reviewing files that changed from the base of the PR and between 470a8b6 and 3ff11ee.

📒 Files selected for processing (11)

centml/cli/main.py
centml/cli/shell.py
centml/sdk/api.py
centml/sdk/shell/__init__.py
centml/sdk/shell/exceptions.py
centml/sdk/shell/session.py
requirements.txt
tests/conftest.py
tests/test_cli_shell.py
tests/test_sdk_shell_renderer.py
tests/test_sdk_shell_session.py

centml/cli/shell.py

centml/sdk/shell/session.py

coderabbitai · 2026-03-25T21:25:42Z

centml/sdk/shell/session.py

+        exit_code = 0
+        buffer = ""
+        is_capturing = False
+        is_done = False


⚠️ Potential issue | 🟠 Major

Don't report success when the socket closes before END_MARKER.

exit_code starts at 0, and the ConnectionClosed path just falls through. If the pod or proxy disconnects before the trailer is parsed, centml cluster exec returns success even though the command outcome is unknown. Please treat “no end marker received” as a transport failure and return non-zero, or raise instead.

Suggested fix

- exit_code = 0 + exit_code = None @@ - except websockets.ConnectionClosed: - pass - return exit_code + except websockets.ConnectionClosed: + if not is_done: + sys.stderr.write("Error: terminal session closed before command completion\n") + return 1 + return exit_code if is_done and exit_code is not None else 1

Also applies to: 258-260

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@centml/sdk/shell/session.py` around lines 226 - 229, The code currently initializes exit_code = 0 and silently falls through on ConnectionClosed, causing a false "success" when END_MARKER wasn't received; update the ConnectionClosed handling in centml/sdk/shell/session.py to treat a socket close before detecting END_MARKER as a transport failure by either setting exit_code to a non-zero value (e.g., 1) or raising an exception (e.g., RuntimeError("socket closed before END_MARKER")), and ensure this logic checks the buffer/is_done/END_MARKER state so you only mark failure when the trailer was not parsed; apply the same change to the second occurrence referenced (around lines 258-260) so both code paths return non-zero or raise on missing END_MARKER.

tests/test_cli_shell.py

coderabbitai

♻️ Duplicate comments (2)

centml/cli/shell.py (2)

41-60: ⚠️ Potential issue | 🟠 Major

Guard against prompting in non-interactive contexts.

exec_cmd can be invoked non-interactively (CI, scripts), but _connect_args may call _select_pod which uses click.prompt(). This will hang or fail when multiple pods exist and --pod/--first-pod is not specified. Consider requiring --pod or --first-pod when stdin is not a TTY, or defaulting to first pod in non-interactive mode.

🛠️ Proposed fix

 def _connect_args(deployment_id, pod, shell_type, first_pod=False):
     """Resolve pod, build WebSocket URL, and obtain auth token."""
     with get_centml_client() as cclient:
         running_pods = get_running_pods(cclient, deployment_id)
         if not running_pods:
             raise click.ClickException(f"No running pods found for deployment {deployment_id}")
 
         if pod is not None:
             try:
                 pod_name = _resolve_pod(running_pods, pod)
             except ShellError as exc:
                 raise click.ClickException(str(exc)) from exc
         elif len(running_pods) == 1 or first_pod:
             pod_name = running_pods[0]
+        elif not sys.stdin.isatty():
+            raise click.ClickException(
+                "Multiple pods found. Specify --pod or --first-pod for non-interactive use."
+            )
         else:
             pod_name = _select_pod(running_pods, deployment_id)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@centml/cli/shell.py` around lines 41 - 60, The _connect_args function can
block by calling _select_pod (which uses click.prompt) in non-interactive runs;
fix by detecting interactivity (e.g. using sys.stdin.isatty()) and if not
interactive and pod is None and not first_pod and len(running_pods) > 1, do not
call _select_pod: instead raise a click.ClickException telling the user to
provide --pod or --first-pod (or alternatively choose to default to
running_pods[0] if you prefer non-interactive default behavior); implement this
check inside _connect_args before calling _select_pod so build_ws_url and
auth.get_centml_token remain unchanged.

80-92: ⚠️ Potential issue | 🟠 Major

Use shlex.join() to preserve command argument boundaries.

" ".join(command) loses quoting and escaping. Arguments like "hello world", $HOME, or * will be misinterpreted by the pod shell. Use shlex.join(command) for correct POSIX shell escaping.

🛠️ Proposed fix

+import shlex
 ...
 def exec_cmd(deployment_id, command, pod, shell_type, first_pod):
     ws_url, token = _connect_args(deployment_id, pod, shell_type, first_pod)
-    exit_code = asyncio.run(exec_session(ws_url, token, " ".join(command)))
+    exit_code = asyncio.run(exec_session(ws_url, token, shlex.join(command)))
     sys.exit(exit_code)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@centml/cli/shell.py` around lines 80 - 92, The pod exec command concatenates
arguments with " ".join(command) which loses quoting/escaping; change exec_cmd
to construct the shell string with shlex.join(command) before calling
exec_session so argument boundaries are preserved (update the call in exec_cmd
that currently does asyncio.run(exec_session(ws_url, token, " ".join(command)))
to use shlex.join(command)); ensure you import shlex at top of
centml/cli/shell.py if not already present.

🧹 Nitpick comments (4)

centml/sdk/shell/__init__.py (1)

4-12: Consider sorting __all__ for consistency.

Ruff (RUF022) flags that __all__ is not sorted. Sorting improves maintainability and aligns with isort-style conventions.

♻️ Proposed fix

 __all__ = [
-    "ShellError",
     "NoPodAvailableError",
      "PodNotFoundError",
+    "ShellError",
     "build_ws_url",
+    "exec_session",
     "get_running_pods",
     "interactive_session",
-    "exec_session",
 ]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@centml/sdk/shell/__init__.py` around lines 4 - 12, The __all__ list is
unsorted (RUF022); sort the exported names alphabetically to satisfy Ruff and
maintain consistency — reorder the entries in __all__ so identifiers like
"NoPodAvailableError", "PodNotFoundError", "ShellError", "build_ws_url",
"exec_session", "get_running_pods", "interactive_session" appear in ascending
lexicographic order (update the __all__ variable in centml/sdk/shell/__init__.py
where it's defined).

tests/test_sdk_shell_session.py (1)

315-323: Store the task reference to prevent potential garbage collection.

The asyncio.create_task() return value is not stored, which could theoretically allow the task to be garbage collected before completion. While this test likely works in practice, storing the reference is the recommended pattern.

♻️ Suggested fix

                 async def _run():
                     async def _set_shutdown():
                         await asyncio.sleep(0.1)
                         shutdown.set()

-                    asyncio.create_task(_set_shutdown())
+                    _ = asyncio.create_task(_set_shutdown())
                     return await forward_io(ws, [80, 24], shutdown)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_sdk_shell_session.py` around lines 315 - 323, The create_task call
in the local coroutine _run uses asyncio.create_task(_set_shutdown()) without
storing its return value, which risks the task being garbage-collected; assign
the created Task to a local variable (e.g., shutdown_task or setter_task) inside
_run so the reference lives at least until forward_io(ws, [80, 24], shutdown)
completes (optionally await or let it finish naturally), keeping function names
_run and _set_shutdown unchanged.

centml/sdk/shell/session.py (2)

127-136: Consider narrowing the exception catch during task cleanup.

The except (asyncio.CancelledError, Exception) catch is broad and silently suppresses all exceptions from cancelled tasks. While this is common in cleanup code, it could mask unexpected errors. Consider catching specific expected exceptions or at minimum logging unexpected ones.

♻️ Suggested improvement

     for t in pending:
         try:
             await t
-        except (asyncio.CancelledError, Exception):
-            pass
+        except asyncio.CancelledError:
+            pass  # Expected for cancelled tasks
+        except websockets.ConnectionClosed:
+            pass  # Expected during shutdown
+        except Exception:
+            pass  # Suppress other errors during cleanup

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@centml/sdk/shell/session.py` around lines 127 - 136, The cleanup loop
currently swallows all exceptions with "except (asyncio.CancelledError,
Exception): pass", which can hide real errors; update the loop that awaits tasks
in "pending" to handle asyncio.CancelledError separately and to catch Exception
as "e" and log it (e.g. using the module/session logger) instead of silencing
it, so reference the variables pending, done, and t and replace the broad except
with a specific asyncio.CancelledError branch and an Exception branch that logs
the unexpected exception details.

164-168: Unhandled exceptions from fire-and-forget coroutine.

asyncio.ensure_future(ws.send(...)) schedules the send but doesn't handle potential exceptions. If the WebSocket is closed when a resize signal arrives, this could produce unhandled exception warnings.

♻️ Suggested fix with exception handling

             def _send_resize():
                 c, r = shutil.get_terminal_size()
                 term_size[0], term_size[1] = c, r
-                asyncio.ensure_future(ws.send(json.dumps({"operation": "resize", "rows": r, "cols": c})))
+                async def _do_resize():
+                    try:
+                        await ws.send(json.dumps({"operation": "resize", "rows": r, "cols": c}))
+                    except websockets.ConnectionClosed:
+                        pass  # Connection already closed, ignore resize
+                asyncio.ensure_future(_do_resize())

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@centml/sdk/shell/session.py` around lines 164 - 168, The _send_resize closure
currently calls asyncio.ensure_future(ws.send(...)) which can raise unhandled
exceptions if the websocket is closed; change this to schedule a coroutine that
catches and logs or ignores exceptions from ws.send. Create an async helper
(e.g., async def _safe_send_resize(rows, cols): try: await ws.send(...); except
Exception as e: process or debug-log the error) and call
asyncio.create_task(_safe_send_resize(r, c)) inside _send_resize so resize
failures are handled gracefully; reference _send_resize, ws.send, term_size, and
the existing asyncio scheduling call when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@centml/cli/shell.py`:
- Around line 41-60: The _connect_args function can block by calling _select_pod
(which uses click.prompt) in non-interactive runs; fix by detecting
interactivity (e.g. using sys.stdin.isatty()) and if not interactive and pod is
None and not first_pod and len(running_pods) > 1, do not call _select_pod:
instead raise a click.ClickException telling the user to provide --pod or
--first-pod (or alternatively choose to default to running_pods[0] if you prefer
non-interactive default behavior); implement this check inside _connect_args
before calling _select_pod so build_ws_url and auth.get_centml_token remain
unchanged.
- Around line 80-92: The pod exec command concatenates arguments with "
".join(command) which loses quoting/escaping; change exec_cmd to construct the
shell string with shlex.join(command) before calling exec_session so argument
boundaries are preserved (update the call in exec_cmd that currently does
asyncio.run(exec_session(ws_url, token, " ".join(command))) to use
shlex.join(command)); ensure you import shlex at top of centml/cli/shell.py if
not already present.

---

Nitpick comments:
In `@centml/sdk/shell/__init__.py`:
- Around line 4-12: The __all__ list is unsorted (RUF022); sort the exported
names alphabetically to satisfy Ruff and maintain consistency — reorder the
entries in __all__ so identifiers like "NoPodAvailableError",
"PodNotFoundError", "ShellError", "build_ws_url", "exec_session",
"get_running_pods", "interactive_session" appear in ascending lexicographic
order (update the __all__ variable in centml/sdk/shell/__init__.py where it's
defined).

In `@centml/sdk/shell/session.py`:
- Around line 127-136: The cleanup loop currently swallows all exceptions with
"except (asyncio.CancelledError, Exception): pass", which can hide real errors;
update the loop that awaits tasks in "pending" to handle asyncio.CancelledError
separately and to catch Exception as "e" and log it (e.g. using the
module/session logger) instead of silencing it, so reference the variables
pending, done, and t and replace the broad except with a specific
asyncio.CancelledError branch and an Exception branch that logs the unexpected
exception details.
- Around line 164-168: The _send_resize closure currently calls
asyncio.ensure_future(ws.send(...)) which can raise unhandled exceptions if the
websocket is closed; change this to schedule a coroutine that catches and logs
or ignores exceptions from ws.send. Create an async helper (e.g., async def
_safe_send_resize(rows, cols): try: await ws.send(...); except Exception as e:
process or debug-log the error) and call
asyncio.create_task(_safe_send_resize(r, c)) inside _send_resize so resize
failures are handled gracefully; reference _send_resize, ws.send, term_size, and
the existing asyncio scheduling call when making the change.

In `@tests/test_sdk_shell_session.py`:
- Around line 315-323: The create_task call in the local coroutine _run uses
asyncio.create_task(_set_shutdown()) without storing its return value, which
risks the task being garbage-collected; assign the created Task to a local
variable (e.g., shutdown_task or setter_task) inside _run so the reference lives
at least until forward_io(ws, [80, 24], shutdown) completes (optionally await or
let it finish naturally), keeping function names _run and _set_shutdown
unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 43be36f8-9df6-44f3-8f76-1d891c236df8

📥 Commits

Reviewing files that changed from the base of the PR and between 470a8b6 and 3ff11ee.

📒 Files selected for processing (11)

centml/cli/main.py
centml/cli/shell.py
centml/sdk/api.py
centml/sdk/shell/__init__.py
centml/sdk/shell/exceptions.py
centml/sdk/shell/session.py
requirements.txt
tests/conftest.py
tests/test_cli_shell.py
tests/test_sdk_shell_renderer.py
tests/test_sdk_shell_session.py

Add two new commands under `centml cluster`: - `shell <id>` -- interactive terminal session (like docker exec -it) - `exec <id> -- <command>` -- run a command and return output (like ssh host "cmd") Both connect via WebSocket through the Platform API terminal proxy, matching the same protocol used by the Web UI TerminalView.

Replaces str.replace("https://", "wss://") with urllib.parse.urlparse scheme replacement to avoid CodeQL py/incomplete-url-substring-sanitization.

Replace url.startswith() assertions with urllib.parse.urlparse() checks to satisfy CodeQL py/incomplete-url-substring-sanitization rule. Reformat both shell.py and test_shell.py with black.

… toggle

…ze messages

…rm.js convertEol

…on connect

Add pyte as local terminal emulator (equivalent to xterm.js) to solve cursor positioning and line wrapping issues. Feed WebSocket output through pyte Screen/Stream and render only dirty lines with ANSI escape sequences.

shutil.get_terminal_size() returns (columns, lines), not (rows, cols). The swapped unpacking caused pyte Screen to be created with terminal line count as width, making the display extremely narrow.

…andling Replace regex-based _strip_ansi with pyte single-row screen for marker detection. pyte interprets all VT100/VT220 sequences including OSC and truecolor escapes that the regex could miss.

If two Code signals arrive within 3 seconds, the shell has exited and the reconnect just opened a new session. Exit cleanly instead of looping forever.

Logs to /tmp/centml_shell_debug.log (overridable via CENTML_SHELL_DEBUG_LOG env var). Traces every WS message, stdin event, task lifecycle, reconnect decision, and connection close.

The platform API proxy never forwards ArgoCD Code messages and does not close the WebSocket when the remote shell exits. Replace the Code/reconnect logic with exit echo detection: when the server echoes back "exit\r\n", arm a 2-second idle timeout on ws.recv(). If no more data arrives, the shell has exited -- break out cleanly. Also removes Code handling from _exec_session (markers already work).

…ompt When "exit\r\n" appears at the end of ws data (nothing after it), the shell has exited -- return immediately instead of waiting 2s. When "exit\r\n" is followed by a new prompt (e.g. from echo exit), ignore it and continue the session.

…ry, ruff

github-advanced-security bot found potential problems Mar 12, 2026

View reviewed changes

tests/test_shell.py Fixed Show fixed Hide fixed

V2arK marked this pull request as draft March 12, 2026 18:12

V2arK self-assigned this Mar 12, 2026

V2arK force-pushed the honglin/shell-forwarding-clean branch 2 times, most recently from 0073b53 to 095bd1e Compare March 12, 2026 18:24

V2arK requested a review from michaelshin March 13, 2026 18:50

V2arK assigned michaelshin Mar 15, 2026

V2arK marked this pull request as ready for review March 15, 2026 16:47

michaelshin requested a review from anandj91 March 17, 2026 17:53

anandj91 reviewed Mar 17, 2026

View reviewed changes

centml/cli/shell.py Outdated Show resolved Hide resolved

anandj91 reviewed Mar 17, 2026

View reviewed changes

centml/sdk/shell/session.py Outdated Show resolved Hide resolved

anandj91 reviewed Mar 17, 2026

View reviewed changes

centml/sdk/shell/session.py Outdated Show resolved Hide resolved

michaelshin requested a review from anandj91 March 17, 2026 20:23

anandj91 reviewed Mar 25, 2026

View reviewed changes

centml/cli/shell.py Show resolved Hide resolved

anandj91 reviewed Mar 25, 2026

View reviewed changes

centml/cli/shell.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

V2arK added 7 commits March 31, 2026 12:21

fix: use urlparse for scheme replacement to satisfy CodeQL

e26170c

Replaces str.replace("https://", "wss://") with urllib.parse.urlparse scheme replacement to avoid CodeQL py/incomplete-url-substring-sanitization.

fix: apply black formatting and fix CodeQL url.startswith alert

996e25b

Replace url.startswith() assertions with urllib.parse.urlparse() checks to satisfy CodeQL py/incomplete-url-substring-sanitization rule. Reformat both shell.py and test_shell.py with black.

style: condense multiline expressions for readability

3c380ed

fix: resolve pylint warnings in shell.py and test_shell.py

1d4a977

fix: skip PyTorch-dependent tests in sanity mode

d84ce85

fix: break out of exec loop after end marker to prevent hanging

cef57d7

V2arK and others added 26 commits March 31, 2026 12:21

fix: await cancelled tasks for cleanup, reduce WS close_timeout to 2s

d3529f2

fix: toggle PTY width to force SIGWINCH and prompt redraw on connect

0ae4ba4

fix: include rows/cols in stdin messages and send Ctrl+L after resize…

a661f3b

… toggle

fix: use stty to set PTY dimensions from inside shell instead of resi…

90f8d9e

…ze messages

fix: re-enable OPOST after setraw to convert bare \n to \r\n like xte…

69b8aad

…rm.js convertEol

fix: convert \n to \r\n in output and use stty to fix PTY dimensions …

dd62ac5

…on connect

feat: use pyte terminal emulator for interactive shell rendering

1ba56af

Add pyte as local terminal emulator (equivalent to xterm.js) to solve cursor positioning and line wrapping issues. Feed WebSocket output through pyte Screen/Stream and render only dirty lines with ANSI escape sequences.

fix: swap rows/cols unpacking from shutil.get_terminal_size

7449bf0

shutil.get_terminal_size() returns (columns, lines), not (rows, cols). The swapped unpacking caused pyte Screen to be created with terminal line count as width, making the display extremely narrow.

fix: use alternate screen buffer to prevent scrollback in Warp terminal

ec9230b

fix: handle WebSocket ConnectionClosed to prevent hang on shell exit

571059c

refactor: use pyte for exec ANSI stripping and add ConnectionClosed h…

192a826

…andling Replace regex-based _strip_ansi with pyte single-row screen for marker detection. pyte interprets all VT100/VT220 sequences including OSC and truecolor escapes that the regex could miss.

fix: treat ArgoCD Code message as reconnect signal, not shell exit code

68142c4

fix: stop reconnecting when shell has genuinely exited

18bb688

If two Code signals arrive within 3 seconds, the shell has exited and the reconnect just opened a new session. Exit cleanly instead of looping forever.

chore: add debug file logging to shell and exec for exit hang diagnosis

289362a

Logs to /tmp/centml_shell_debug.log (overridable via CENTML_SHELL_DEBUG_LOG env var). Traces every WS message, stdin event, task lifecycle, reconnect decision, and connection close.

fix: skip websocket close handshake wait after session ends

62176d6

refactor: extract shell logic from CLI to SDK layer

13d1d48

refactor: extract shell logic to SDK layer, rely on server close frame

4e619ec

ruff format

1101c51

refactor: remove debug logging, fix unused imports and SDK/CLI bounda…

2a201ed

…ry, ruff

update redundancy

d3d2064

clean up pyte

f79fbf0

clean up implementation

bb14ba2

address comments

ea6e0c8

fix lint

78cfb45

michaelshin force-pushed the honglin/shell-forwarding-clean branch from 3ff11ee to 78cfb45 Compare March 31, 2026 16:22

michaelshin added 2 commits March 31, 2026 13:19

address comments

6f84d40

revert

1dca1d6

michaelshin requested a review from anandj91 March 31, 2026 18:34

Conversation

V2arK commented Mar 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anandj91 commented Mar 25, 2026

Uh oh!

coderabbitai bot commented Mar 25, 2026

Uh oh!

anandj91 Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

michaelshin Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

anandj91 commented Mar 25, 2026

Uh oh!

coderabbitai bot commented Mar 25, 2026

Uh oh!

coderabbitai bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

V2arK commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 25, 2026 •

edited

Loading