Skip to content

feat: resume intent detection with HTTP 410#11

Closed
maxence2997 wants to merge 1 commit intomainfrom
feat/resume-intent
Closed

feat: resume intent detection with HTTP 410#11
maxence2997 wants to merge 1 commit intomainfrom
feat/resume-intent

Conversation

@maxence2997
Copy link
Copy Markdown
Contributor

Summary

  • Clients reconnecting with ?resume=true after grace window expiry are rejected with HTTP 410 Gone (no WebSocket upgrade), and ResumeAttempt(roomID, connectionID, false) is emitted
  • Hub race case (session expires between HTTP pre-check and hub processing) sends WS close 4100 (CloseSessionExpired) instead of creating a new session
  • CloseSessionExpired (4100) re-exported from core; ResumeAttempt GoDoc updated to reflect success=false semantics

Test plan

  • TestIntegration_MetricsCollector_ResumeAttempt_Rejected — connect, disconnect, grace expiry, reconnect with ?resume=true → HTTP 410, ResumeAttempt(false) emitted, no ConnectionOpened
  • Existing TestIntegration_MetricsCollector_ResumeAttempt (success path) still passes
  • Full integration suite with race detector passes
  • Hub race case test (to be added in follow-up — requires precise timing control between grace timer and hub processing)

🤖 Generated with Claude Code

1.Client ?resume=true + expired session → HTTP 410 + ResumeAttempt(false)
2.Hub race: session expires mid-upgrade → WS close 4100, no new session
3.Re-export CloseSessionExpired from core

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 26, 2026 15:10
@github-actions github-actions bot added the feature New feature label Mar 26, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds explicit “resume intent” handling to distinguish reconnect/resume attempts from new sessions, improving failed-resume observability and avoiding unnecessary upgrades when a resume is doomed.

Changes:

  • Add ?resume=true pre-check in ServeHTTP to reject expired resumption attempts with HTTP 410 Gone and emit ResumeAttempt(..., false).
  • Handle the hub race case by rejecting the connection with WS close 4100 (CloseSessionExpired) instead of creating a new session.
  • Update metrics/docs/changelog and add an integration test for the HTTP 410 rejection path.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
types.go Re-exports CloseSessionExpired close code from core.
server.go Implements HTTP-layer resume intent pre-check and threads resumeIntent into hub registration.
hub.go Adds resumeIntent to registerMessage and handles the race-case rejection with close 4100.
metrics.go Updates ResumeAttempt GoDoc to define success=false semantics and scenarios.
metrics_integration_test.go Adds integration coverage for HTTP 410 + ResumeAttempt(false) emission.
doc/internals.md Documents the resume-intent flow and race-case behavior.
CHANGELOG.md Records the new behavior and exported close code.

Comment on lines 122 to +126
if connectionID == "" {
connectionID = uuid.NewString()
}

resumeIntent := r.URL.Query().Get("resume") == "true"
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServeHTTP assigns a random UUID when ConnectFunc returns an empty connectionID, but resumeIntent is evaluated after that. If a client sends ?resume=true while the server-generated ID path is used, this will always fail the pre-check and return HTTP 410 even though the server cannot meaningfully “resume” without a deterministic connectionID. Consider only honoring resume intent when ConnectFunc provided a non-empty connectionID (or clear resumeIntent when you auto-generate the ID).

Copilot uses AI. Check for mistakes.
Comment on lines +182 to +186
_ = message.transport.WriteMessage(
websocket.CloseMessage,
websocket.FormatCloseMessage(CloseSessionExpired, "session expired"),
)
_ = message.transport.Close()
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleRegister writes a WebSocket close frame directly from the hub event loop. Network I/O here can block the hub goroutine (and delay processing for all sessions) if the peer is slow or the write stalls. Consider moving the close-frame write/transport close into a short-lived goroutine and/or using a write deadline (e.g., WriteControl with a deadline) to keep the hub loop non-blocking.

Suggested change
_ = message.transport.WriteMessage(
websocket.CloseMessage,
websocket.FormatCloseMessage(CloseSessionExpired, "session expired"),
)
_ = message.transport.Close()
// Send a close frame and close the transport outside the hub loop to
// avoid blocking the hub goroutine on network I/O.
go func(conn *websocket.Conn) {
deadline := time.Now().Add(5 * time.Second)
_ = conn.WriteControl(
websocket.CloseMessage,
websocket.FormatCloseMessage(CloseSessionExpired, "session expired"),
deadline,
)
_ = conn.Close()
}(message.transport)

Copilot uses AI. Check for mistakes.
Comment on lines +179 to +181
zap.String("room_id", message.roomID),
)
h.config.metrics.ResumeAttempt(message.roomID, message.connectionID, false)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the hub race-case failure path, ResumeAttempt(false) (and the log field room_id) use message.roomID from ConnectFunc. But ConnectFunc’s roomID is documented as ignored for resumption, so it may not match the room of the session the client was trying to resume. Consider capturing the original session roomID during the ServeHTTP pre-check (when the session exists) and carrying that into registerMessage so metrics/logging for the race-case are attributed to the correct room.

Suggested change
zap.String("room_id", message.roomID),
)
h.config.metrics.ResumeAttempt(message.roomID, message.connectionID, false)
)
// The original session has already been destroyed; we no longer know its
// room. Avoid using message.roomID from ConnectFunc (ignored for resume)
// to prevent misattributing metrics to the wrong room.
h.config.metrics.ResumeAttempt("", message.connectionID, false)

Copilot uses AI. Check for mistakes.
Comment on lines +69 to +76
// ResumeAttempt is called when a client attempts to resume a suspended session.
// success is true when the session was successfully resumed (transport swap),
// and false when the session no longer exists (grace window expired).
//
// A failed resume (success=false) is emitted in two scenarios:
// - ServeHTTP pre-check: client connects with ?resume=true but
// hub.get(connectionID) returns nil. HTTP 410 is returned.
// - Hub race case: pre-check passed but the session expired between
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MetricsCollector interface header comment says hooks are called from the hub/readPump/writePump goroutines, but ResumeAttempt(success=false) is now also invoked from the ServeHTTP goroutine (pre-check). Please update the interface-level documentation to include ServeHTTP so implementers don’t miss this call site/concurrency context.

Copilot uses AI. Check for mistakes.
Comment on lines +171 to +175
// Race case: the HTTP pre-check saw the session, but by the time the
// hub processes the registerMessage the grace timer has fired and
// destroyed it. The client wanted to resume but the session is gone.
// Reject with WS close 4100 (CloseSessionExpired) — do not create a
// new session, consistent with the HTTP 410 rejection in ServeHTTP.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hub race-case behavior (resumeIntent=true, session removed between HTTP pre-check and hub handling) now rejects the connection (WS close 4100) instead of creating a new session, but there’s no test covering this path. Since it’s timing-sensitive, adding an integration/unit test with controllable clock/synchronization would help prevent regressions.

Copilot uses AI. Check for mistakes.
@maxence2997
Copy link
Copy Markdown
Contributor Author

Superseded — success bool parameter removed entirely instead of implementing resume intent detection. See refactor/remove-resume-success-bool branch.

@maxence2997 maxence2997 deleted the feat/resume-intent branch March 27, 2026 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants