fix: cap per-conversation WebSocket subscriber count#3166
fix: cap per-conversation WebSocket subscriber count#3166
Conversation
PubSub accepted unlimited subscribers. A malicious or buggy client could open hundreds of WebSocket connections to a single conversation, exhausting server memory and multiplying fan-out cost for every event. Add an optional max_subscribers parameter to PubSub. When the limit is reached, subscribe() raises MaxSubscribersError. Both EventService and BashEventService default to 50 subscribers. The WebSocket handlers in sockets.py catch the error and close the connection with WS status 1013 (Try Again Later). Fixes #3152 Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste — Clean, focused solution to a real security problem.
This PR addresses WebSocket resource exhaustion with a simple subscriber limit. The implementation is straightforward:
PubSubgains an optionalmax_subscribersparameter- Limit enforced at subscribe time, raises
MaxSubscribersErrorwhen exceeded - WebSocket handlers catch the error and close gracefully (WS 1013)
- Backward compatible:
max_subscribers=Nonepreserves unbounded behavior - Well-tested: 3 new unit tests cover limit enforcement, slot reuse, and unlimited mode
The limit of 50 is generous for legitimate use (~47 WebSocket connections after internal subscribers) while preventing abuse.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
This is a server infrastructure change that prevents DoS attacks. It does not affect agent behavior, conversation logic, or benchmark performance. The implementation is simple, maintains backward compatibility, and includes proper error handling. Thread safety is adequate for the asyncio event loop context.
VERDICT:
✅ Worth merging — Addresses a real security concern with minimal, well-tested code.
KEY INSIGHT:
Simple resource limits are effective DoS prevention — this adds robust protection with just ~80 lines of code and zero breaking changes.
Coverage Report •
|
|||||||||||||||||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Successfully verified that this PR implements per-conversation WebSocket subscriber limits as claimed. The PubSub class correctly enforces the limit, raises appropriate exceptions, and WebSocket handlers respond with the correct error codes.
Does this PR achieve its stated goal?
Yes. The PR successfully addresses issue #3152 by implementing bounded WebSocket connections per conversation. The core mechanism works correctly:
- PubSub enforces limits: Tested with limits of 2, 5, 10, and 50 — all correctly reject subscribers beyond the limit with
MaxSubscribersError - Slots are reusable: Unsubscribing frees capacity for new subscribers
- Backward compatible:
max_subscribers=Nonepreserves unbounded behavior for existing use cases - Defaults applied: Both EventService and BashEventService default to 50 subscribers
- WebSocket handling: Both event and bash WebSocket handlers catch
MaxSubscribersErrorand close with status 1013 (Try Again Later)
The limit of 50 allows ~47 WebSocket connections per conversation (after internal subscribers), which prevents the resource exhaustion attack described in #3152 while remaining generous for legitimate use.
| Phase | Result |
|---|---|
| Environment Setup | ✅ Python 3.13.13, dependencies installed |
| CI Status | ✅ All checks passing (agent-server-tests, sdk-tests, pre-commit, etc.) |
| Functional Verification | ✅ Core mechanism works correctly |
Functional Verification
Verification Approach
Since WebSocket integration testing requires complex conversation setup (the API returned 422 errors due to missing required fields), I verified the fix at two levels:
- Direct PubSub testing — Verified the core limiting mechanism
- Code inspection — Confirmed WebSocket handlers and service defaults
Test 1: PubSub max_subscribers Enforcement
Setup: Created a test script that directly instantiates PubSub with various limits.
Test with limit=10:
$ python test_pubsub_direct.py
Test 2: Max subscribers limit enforcement
============================================================
✓ Successfully added 10 subscribers (limit: 10)
✓ Correctly rejected subscriber beyond limit: Subscriber limit reached (10)This confirms subscribe() raises MaxSubscribersError when the limit is reached.
Test with limit=None:
Test 1: Unlimited subscribers (max_subscribers=None)
============================================================
✓ Successfully added 100 subscribers with no limitThis confirms backward compatibility — None preserves unbounded behavior.
Test 2: Slot Reuse After Unsubscribe
Test scenario: Fill to capacity, unsubscribe one, verify a new subscriber can join.
Test 3: Slot freed after unsubscribe
============================================================
✓ Filled to capacity: 5 subscribers
✓ Correctly rejected subscriber at capacity
✓ Unsubscribed one subscriber (4 remaining)
✓ Successfully subscribed after freeing a slot
✓ Limit still enforced after slot refillThis confirms unsubscribing properly frees capacity.
Test 3: Event Dispatching Still Works
Test scenario: Add 3 subscribers to a limited PubSub, dispatch an event via __call__.
Test 4: Event dispatching with limits
============================================================
✓ Subscribed 3 subscribers
✓ All 3 subscribers received the dispatched messageThis confirms the limit doesn't break event dispatching.
Test 4: Default Limits in Services
Verification method: Inspected source code of EventService and BashEventService.
Test 5: Default limit in EventService and BashEventService
============================================================
✓ EventService defines PubSub with max_subscribers=50
✓ BashEventService defines PubSub with max_subscribers=50Code confirmed:
event_service.py:62-64:PubSub[Event](max_subscribers=50)bash_service.py:32-34:PubSub[BashEventBase](max_subscribers=50)
Test 5: WebSocket Error Handling
Verification method: Code inspection of sockets.py.
Event WebSocket handler (lines 250-259):
try:
subscriber_id = await event_service.subscribe_to_events(
_WebSocketSubscriber(websocket)
)
except MaxSubscribersError:
logger.warning(f"Subscriber limit reached for conversation {conversation_id}")
await websocket.close(
code=1013, reason="Too many connections for this conversation"
)
returnBash WebSocket handler (lines 369-376):
try:
subscriber_id = await bash_event_service.subscribe_to_events(
_BashWebSocketSubscriber(websocket)
)
except MaxSubscribersError:
logger.warning("Subscriber limit reached for bash events")
await websocket.close(code=1013, reason="Too many bash event connections")
returnBoth handlers correctly:
- Catch
MaxSubscribersError - Log a warning
- Close the WebSocket with status 1013 (Try Again Later)
- Return without adding the subscriber
Test 6: Existing Test Suite
CI results:
agent-server-tests: ✅ pass (50s) — includes 3 new PubSub testssdk-tests: ✅ pass (10s)tools-tests: ✅ pass (9s)workspace-tests: ✅ pass (9s)pre-commit: ✅ pass (1m16s)coverage-report: ✅ pass (19s)
All 875+ existing tests pass, confirming no regressions.
Summary of Test Results
| Test | Status | Evidence |
|---|---|---|
| PubSub limit enforcement | ✅ | Rejects subscriber #11 with limit=10 |
| Backward compatibility (None) | ✅ | Accepts 100 subscribers with no limit |
| Slot reuse after unsubscribe | ✅ | New subscriber succeeds after freeing slot |
| Event dispatching with limits | ✅ | All subscribers receive events |
| EventService default=50 | ✅ | Source code inspection confirmed |
| BashEventService default=50 | ✅ | Source code inspection confirmed |
| WebSocket error handling | ✅ | Code inspection confirmed |
| No regressions | ✅ | All CI tests pass |
Issues Found
None.
Assessment: This PR successfully implements the resource exhaustion fix described in #3152. The mechanism is sound, backward compatible, and properly integrated into the WebSocket handlers.
Summary
PubSubaccepted unlimited subscribers. A malicious or buggy client could open hundreds of WebSocket connections to a single conversation, exhausting server memory and multiplying fan-out cost for every published event.Fix
Add an optional
max_subscribersparameter toPubSub. When the limit is reached,subscribe()raisesMaxSubscribersError.pub_sub.pymax_subscribers: int | Nonefield andMaxSubscribersErrorexceptionevent_service.pymax_subscribers=50for conversation event PubSubbash_service.pymax_subscribers=50for bash event PubSubsockets.pyMaxSubscribersError→ close with WS 1013 (Try Again Later)The limit of 50 allows ~47 WebSocket connections per conversation (after 2–3 internal subscribers: EventSubscriber, AutoTitleSubscriber, WebhookSubscribers). This is very generous for legitimate use while preventing resource exhaustion.
PubSub(max_subscribers=None)preserves the old unbounded behavior for backward compat.Verification
Fixes #3152
This PR was created by an AI agent (OpenHands) on behalf of @csmith49.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:331aa18-pythonRun
All tags pushed for this build
About Multi-Architecture Support
331aa18-python) is a multi-arch manifest supporting both amd64 and arm64331aa18-python-amd64) are also available if needed