Skip to content

Preserve meaningful last_error in LibevConnection close paths#700

Open
dkropachev wants to merge 1 commit intomasterfrom
fix/libev-close-race-614
Open

Preserve meaningful last_error in LibevConnection close paths#700
dkropachev wants to merge 1 commit intomasterfrom
fix/libev-close-race-614

Conversation

@dkropachev
Copy link
Collaborator

@dkropachev dkropachev commented Feb 14, 2026

Summary

Improves error reporting in LibevConnection so that users see a clear ConnectionShutdown message instead of a confusing [Errno 9] Bad file descriptor when connections close during node restarts.

The libev reactor was already safe from the EBADF race — defunct() returns early when is_closed is true, and the single-threaded event loop + _loop_will_run() design prevents watchers from firing on closed fds. The actual problem was that last_error wasn't set to a meaningful value in two close paths, so the stale EBADF string leaked up to factory() and confused users.

Changes

  • Set last_error in close() when connected_event is not yet set, so factory() reports ConnectionShutdown instead of whatever stale error was on the connection
  • Set last_error on server-initiated close (EOF) in handle_read() before calling close()

Test plan

  • tests/unit/io/test_libevreactor.py passes

Refs #614

@dkropachev dkropachev marked this pull request as draft February 14, 2026 16:40
@dkropachev dkropachev force-pushed the fix/libev-close-race-614 branch from 8543c1f to 6aabcee Compare February 16, 2026 13:02
LibevConnection.close() closes the socket immediately while watchers
are stopped asynchronously in the next event loop iteration via
_loop_will_run(). This creates a race window where handle_read() or
handle_write() can operate on a closed socket fd, producing EBADF
errors that surface as ConnectionShutdown.

- Add is_closed/is_defunct guards in handle_read() and handle_write()
  error paths to silently exit during shutdown instead of calling
  defunct() with EBADF
- Set last_error in close() when connected_event is not yet set to
  prevent factory() from returning a dead connection
- Set last_error on server-initiated close (EOF) in handle_read()
  before calling close()
@dkropachev dkropachev force-pushed the fix/libev-close-race-614 branch from 6aabcee to 96add52 Compare February 16, 2026 13:09
@dkropachev dkropachev changed the title Fix LibevConnection close() race causing EBADF errors Preserve meaningful last_error in LibevConnection close paths Feb 16, 2026
@dkropachev dkropachev self-assigned this Feb 16, 2026
@dkropachev dkropachev marked this pull request as ready for review February 16, 2026 13:26
Comment on lines +384 to 386
self.last_error = ConnectionShutdown(
"Connection to %s was closed by server" % self.endpoint)
self.close()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you set it unconditionally here, but guard it with if not self.connected_event.is_set(): in the other place?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in close, it guards against overwriting last_error when connection was closer before it was properly initialized, in handle_read it is just channeling error reason back to close, connection is already initialized at this point.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I don't get it :(
connected_event can be set in 4 places (for libevreactor at least, I didn't loo at others):

  1. defunct in connection.py, which calls close before setting the event.
  2. _handle_startup_response in case of ReadyMessage - successfull connection creation.
  3. _handle_auth_response in case of AuthSuccessMessage - successfull connection creation.
  4. close in libevreactor.py.

You said that you want to guard against overwriting error when connection was closed before properly initialized, but it looks to me like you are doing the opposite.
You are overwriting error only if event was not yet set. If it was not yet set then we didn't receive ReadyMessage or AuthSuccessMessage. We also can't be in defunct, because we are in if not self.is_defunct:, and also defunct calls close before setting event.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sylwiaszunejko since you approved the PR, maybe you understand this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am wrong, but I understand it that way that if it was set by defunct or close we will not call handle_read anymore, so there is no other error that we would possible want to have here. In close if it was not yet set then we didn't receive ReadyMessage or AuthSuccessMessage, so the conn was not properly initialized as Dmitry said, or no?
I am now confused tbh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think I get it. If the event is set in this check, it was set by either ReadyMessage (_handle_startup_response), or by AuthSuccessMessage (_handle_auth_response). In both of those cases the connection is properly initialized.
So the check looks correct - it will only set error if connection was not initialized yet.
I would love to better understand the paths that can trigger this, to verify that they don't set last_error already, but I won't hold this PR over that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants