fix: introduce level-triggered waitable flag to trigger batcher... by supervacuus · Pull Request #1558 · getsentry/sentry-native

supervacuus · 2026-03-06T09:04:30Z

…from any producer.

Fixes: #1402 (https://linear.app/getsentry/issue/NATIVE-130/structured-logs-follow-up-replace-cond-wake-with-level-triggered-flag)

…the producer

src/sentry_batcher.c

…te and fall back for older Windows and Xbox similar to Posix other than Linux and Android.

… cycles

…the buffers transmitted during the final shutdown flush... add them to the test assertion

…fferentiating based on what the non-existent condvar meant previously

…aded

src/sentry_sync.h

src/sentry_batcher.c

supervacuus · 2026-03-06T11:21:18Z

Two things I changed/introduced here that might be of interest:

I currently don't use WaitOnAddress on Xbox because I don't know if the synchronization lib is part of the devkit (you can always enable it later if you want to).
I now let sentry-crash inherit the platform libs from above, because otherwise we have to introduce dependencies for the core twice (this was triggered by the E2E test for Windows). Let me know @mujacica if that would be a problem.

src/sentry_sync.h

…meaningless than before... we are not only bounded by the last flush, but since we now trigger with almost immediate wake on essentially any count in the buffer... we can get requests as high as the number of items we send. The important part is the total count and that one has been stable throughout.

src/sentry_sync.h

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

src/sentry_sync.h

CMakeLists.txt

… could still be running after the example process terminated due to the crash

The two-phase shutdown (begin/wait) was designed to signal both the logs and metrics threads in parallel before joining either. With #1558 replacing the condvar with a level-triggered waitable flag, thread wake latency drops to sub-millisecond (futex/WaitOnAddress), making the parallel signaling unnecessary. Merging into a single function also fixes a race where a concurrent sentry_init() could replace g_batcher between the signal and join steps, causing the new batcher to be shut down instead of the old one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…unting (#1556) * test(logs): add stress test for batcher reinit hang Spawns 8 producer threads continuously logging while the main thread does 10 cycles of sentry_init/sentry_close. Reproduces a condvar corruption hang in the batcher under TSan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(batcher): replace static global with dynamic allocation and refcounting The static g_batcher persisted across init/close cycles, causing condvar corruption when sentry__batcher_startup re-initialized the condvar while old threads still used it. Dynamic allocation with refcounting solves both the condvar corruption and the lifetime problem. Introduces sentry_batcher_ref_t with acquire/release/swap API that encapsulates a spinlock to make concurrent access from producer threads and shutdown safe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(batcher): release old batcher on swap in startup When the previous startup failed to spawn its thread, shutdown_begin returns false and shutdown_wait is never called. The old batcher stays in g_batcher and leaks when startup swaps it out without releasing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(batcher): NULL-guard batcher in shutdown_wait Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update CHANGELOG.md * fix(batcher): drain buffer items in release to prevent leaks A producer that acquired a ref before shutdown could enqueue items after the final flush. These items were leaked when the batcher was freed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(batcher): use SENTRY_MAKE macro for allocation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(batcher): destroy condvar before freeing batcher memory Add sentry__cond_destroy for all platforms (pthread_cond_destroy on POSIX, CloseHandle on pre-Vista Windows, no-op on Vista+) and call it in sentry__batcher_release. Without this, each SDK re-initialization leaks kernel handles on pre-Vista Windows and violates POSIX which requires pthread_cond_destroy before freeing memory containing a pthread_cond_t. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "fix(batcher): destroy condvar before freeing batcher memory" This reverts commit 467bfb5ecb2c4528e1048dab1c467203fdc29cdf. * refactor(batcher): merge two-phase shutdown into single function The two-phase shutdown (begin/wait) was designed to signal both the logs and metrics threads in parallel before joining either. With #1558 replacing the condvar with a level-triggered waitable flag, thread wake latency drops to sub-millisecond (futex/WaitOnAddress), making the parallel signaling unnecessary. Merging into a single function also fixes a race where a concurrent sentry_init() could replace g_batcher between the signal and join steps, causing the new batcher to be shut down instead of the old one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test(metrics): add reinit stress test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: speed up reinit stress tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(batcher): use lock-free peek in crash-safe flush path The crash-safe flush functions called sentry__batcher_acquire, which spins on a spinlock. If the crash occurs on the thread holding that lock, the signal handler deadlocks. Replace with a new lock-free sentry__batcher_peek that reads ref->ptr via atomic load without taking the spinlock or bumping the refcount (safe because the process is dying). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(batcher): use pointer-width atomics in batcher_peek for Windows On Windows x64, `long` is 32-bit but pointers are 64-bit. Use InterlockedCompareExchangePointer on Windows and __atomic_load elsewhere to avoid truncating the pointer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(batcher): add header comments for ref management functions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(batcher): fix race between force flush and reinit force_flush_begin and force_flush_wait each acquired the batcher independently. A concurrent sentry_init could swap the batcher between the two calls, causing wait to flush the new empty batcher while the original data is lost. Fix by returning an opaque token (the acquired batcher ref) from begin and passing it to wait, ensuring both operate on the same instance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update src/sentry_logs.c Co-authored-by: Mischan Toosarani-Hausberger <mischan@abovevacant.com> * Update sentry__metrics_startup * docs(batcher): clarify ownership semantics of options refs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Mischan Toosarani-Hausberger <mischan@abovevacant.com>

Fixes: ``` sentry-native.lib(sentry_batcher.obj) : error LNK2019: unresolved external symbol WaitOnAddress referenced in function batcher_thread_func sentry-native.lib(sentry_batcher.obj) : error LNK2019: unresolved external symbol WakeByAddressSingle referenced in function sentry__batcher_enqueue ``` See also: - getsentry/sentry-native#1558

fix: introduce level-triggered waitable flag to trigger batcher from …

aae26a8

…the producer

jpnurmi reviewed Mar 6, 2026

View reviewed changes

src/sentry_batcher.c Outdated Show resolved Hide resolved

supervacuus added 7 commits March 6, 2026 10:21

ensure we have the WaitOnAddress library linked and also introduce ga…

a34e529

…te and fall back for older Windows and Xbox similar to Posix other than Linux and Android.

add deadline checks to the sleepy polls instead of accumulating sleep…

da8e3db

… cycles

with the new extremely low latency wakes on Windows and Linux we see …

5f93661

…the buffers transmitted during the final shutdown flush... add them to the test assertion

make the active buffer content the flush arbiter in the batcher vs di…

b71eff0

…fferentiating based on what the non-existent condvar meant previously

adapt assertions in test_logs_threaded analogous to test_metrics_thre…

03d0e79

…aded

Merge branch 'master' into fix/introduce_level-triggered_wait_flag

8a0d8ac

let sentry-crash inherit the platform libs

d999237

supervacuus marked this pull request as ready for review March 6, 2026 11:14

cursor bot reviewed Mar 6, 2026

View reviewed changes

src/sentry_sync.h Show resolved Hide resolved

src/sentry_batcher.c Outdated Show resolved Hide resolved

supervacuus requested review from jpnurmi and mujacica March 6, 2026 11:21

jpnurmi mentioned this pull request Mar 6, 2026

fix(batcher): replace static global with dynamic allocation and refcounting #1556

Merged

supervacuus added 2 commits March 6, 2026 13:10

align linux fast path for flag_wait with the other implementations.

af0b993

use SENTRY_BATCHER_FLUSH_INTERVAL_MS

604ed64

cursor bot reviewed Mar 6, 2026

View reviewed changes

src/sentry_sync.h Show resolved Hide resolved

supervacuus added 2 commits March 6, 2026 13:23

align POSIX fallback for consistency

c77e99f

sentry bot reviewed Mar 6, 2026

View reviewed changes

src/sentry_sync.h Outdated Show resolved Hide resolved

cursor bot reviewed Mar 6, 2026

View reviewed changes

src/sentry_sync.h Show resolved Hide resolved

remove another inconsistency that could trip up the bots

a1ad093

jpnurmi reviewed Mar 6, 2026

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

jpnurmi approved these changes Mar 6, 2026

View reviewed changes

supervacuus added 2 commits March 6, 2026 14:44

use WOA on Xbox too

f6db902

add a time-bounded wait loop for the native backend's daemon since it…

ecf21c9

… could still be running after the example process terminated due to the crash

supervacuus merged commit a42aca5 into master Mar 6, 2026
50 checks passed

supervacuus deleted the fix/introduce_level-triggered_wait_flag branch March 6, 2026 15:43

limbonaut mentioned this pull request Mar 7, 2026

fix(logs): Fix crash when log body contains format specifiers getsentry/sentry-godot#578

Merged

limbonaut mentioned this pull request Mar 9, 2026

chore(deps): update Native SDK to v0.13.2 getsentry/sentry-godot#581

Merged

BrewTestBot mentioned this pull request Mar 9, 2026

sentry-native 0.13.2 Homebrew/homebrew-core#271436

Merged

1 task

jpnurmi mentioned this pull request Mar 11, 2026

chore(deps): update Native SDK to v0.13.2 getsentry/sentry-dotnet#4996

Open

tustanivsky mentioned this pull request Mar 11, 2026

chore(deps): update Native SDK to v0.13.2 getsentry/sentry-unreal#1276

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: introduce level-triggered waitable flag to trigger batcher...#1558

fix: introduce level-triggered waitable flag to trigger batcher...#1558
supervacuus merged 15 commits intomasterfrom
fix/introduce_level-triggered_wait_flag

supervacuus commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

supervacuus commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

supervacuus commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

supervacuus commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants