Skip to content

gh-140746: Fix Thread.start() that can hang indefinitely - v2#144750

Open
YvesDup wants to merge 6 commits intopython:mainfrom
YvesDup:thread-hang-memory
Open

gh-140746: Fix Thread.start() that can hang indefinitely - v2#144750
YvesDup wants to merge 6 commits intopython:mainfrom
YvesDup:thread-hang-memory

Conversation

@YvesDup
Copy link
Contributor

@YvesDup YvesDup commented Feb 12, 2026

This PR is based on the @ryv-odoo PR. I would have preferred to start from @ryv-odoo's PR branch rather than duplicating his code, but I don't know if that's possible.

Following the @colesbury comment,

I would lean towards changing Thread.start and _start_joinable_thread so that we can do the waiting in C:

       try:
           # Start joinable thread
           _start_joinable_thread(self._bootstrap, handle=self._os_thread_handle,
                                  daemon=self.daemon)
       except Exception:
           with _active_limbo_lock:
               del _limbo[self]
           raise

I think we could add a new states to _os_thread_handle to represent the state when self._started.set() was called and to represent the state when the thread fails before then.

I submit this PR that includes the waiting in C the module and raise a runtime exception when the thread handle state is set to failed. This exception is treated in the Thread.start.

Copy link
Contributor

@bkap123 bkap123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of small comments on semantics.

@@ -0,0 +1,2 @@
Fix :func:`threading.Thread.start` that can hang indefinitely in case of heap memory exhaustion
during initialization of the new thread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add something about the new FAILED state that is managed by the C code?

Copy link

@ryv-odoo ryv-odoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @YvesDup ,

Now I understand what you meant in your comment 😄. Sorry I certainly misunderstood your comment at the time.

This version changes the behavior slightly from mine: The serving thread (aka the one who call start() on the new Thread) will crash (raise an exception) because of an error raised from the new Thread.
I am not a huge fan of this change because it means that the serving thread might not recover from it (and I feel that errors from sub thread shouldn't impact the parent Thread). But I don't have a strong opinion about it 🤔

I will apply the comments of @bkap123 in my branch since they mainly target my code.

Have a nice day.

@YvesDup
Copy link
Contributor Author

YvesDup commented Feb 25, 2026

I am not a huge fan of this change because it means that the serving thread might not recover from it (and I feel that errors from sub thread shouldn't impact the parent Thread). But I don't have a strong opinion about it 🤔

Hi @ryv-odoo ,
I agree with your remark about the threads, that changes the initial behavior. I hadn't considered that point.

In this PR, it is possible to wait in C and just cleans up the two dicts in the start method in an exception block or outside, regarding the value of self._os_thread_handle.is_bootstraped().

I proposed raising an exception because I would notify the devs that the new thread starts fine but fails before calling the _bootstrap(*) methods. Peharps, it is not a good idea and not appropriate due to the change of initial behavior.


# Give the thread a moment to clean up after itself
# specially in free-threading builds.
time.sleep(0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? This looks like it's hiding some underlying problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants