Skip to content

[CoreCLR/NativeAOT] Simplify synchronization in ManagedValueManager#10796

Open
simonrozsival wants to merge 5 commits intomainfrom
dev/simonrozsival/managed-value-manager-synchronization
Open

[CoreCLR/NativeAOT] Simplify synchronization in ManagedValueManager#10796
simonrozsival wants to merge 5 commits intomainfrom
dev/simonrozsival/managed-value-manager-synchronization

Conversation

@simonrozsival
Copy link
Member

@simonrozsival simonrozsival commented Feb 10, 2026

Summary

Simplify synchronization in ManagedValueManager by removing shared mutable state between the bridge thread and managed threads.

Key insight

The bridge thread must not hold ANY managed lock that managed threads also hold. A managed thread holding a shared lock can be suspended at a GC safepoint (e.g., WeakReference.TryGetTarget()). If the GC then triggers bridge processing, the bridge thread will block trying to acquire the same lock, creating a 3-way deadlock: managed thread suspended → bridge blocked on lock → GC waiting for bridge.

This was validated through extensive stress testing (~30,000 iterations, ~1,500 GC cycles per run) on a physical Android device.

Changes

  1. Remove bridgeProcessingSemaphoreWaitForGCBridgeProcessing() becomes a no-op. The semaphore Wait()/Release() pattern was not actually preventing anything useful.

  2. Embed GCHandle directly in HandleContext struct — eliminates the Dictionary<IntPtr, GCHandle> referenceTrackingHandles which was the last shared mutable state between bridge and managed threads. The GCHandle is stored as an IntPtr field in the struct (native code only accesses the first two fields, so this is safe).

  3. Move EnsureAllContextsAreOurs behind #if DEBUG — this validation used the now-removed dictionary. In DEBUG builds, a simple HashSet<IntPtr> provides the same validation.

  4. Schedule CollectPeers via Task.Run after bridge processing — instead of draining the collected contexts queue on every AddPeer call, cleanup is scheduled on a thread pool thread after FinishCrossReferenceProcessing returns.

  5. Use dedicated System.Threading.Lock — replace lock(RegisteredInstances) with a dedicated Lock instance.

Data ownership after this change

Data Owner Protected by
RegisteredInstances Managed threads _registeredInstancesLock
HandleContext native memory Bridge thread (alloc on managed, free on bridge) N/A (single owner)
CollectedContexts queue Bridge enqueues, managed dequeues ConcurrentQueue (lock-free)

The bridge thread now touches zero shared locks in Release builds.

Fixes #10783

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates CoreCLR/NativeAOT GC-bridge synchronization in ManagedValueManager by replacing the prior SemaphoreSlim/lock approach with a ReaderWriterLockSlim intended to better coordinate access to peer registration state during bridge processing.

Changes:

  • Replaced SemaphoreSlim + lock (RegisteredInstances) with a single ReaderWriterLockSlim (read locks for lookups, write locks for mutations).
  • Refactored AddPeer, RemovePeer, PeekPeer, and GetSurfacedPeers to use the new lock strategy.
  • Removed locking around HandleContext.referenceTrackingHandles and altered bridge-processing start/finish synchronization to use the new lock.
Comments suppressed due to low confidence (1)

src/Mono.Android/Microsoft.Android.Runtime/ManagedValueManager.cs:74

  • CollectPeers() releases rwLock before calling HandleContext.Free(), but Free() mutates the static referenceTrackingHandles dictionary. This can race with bridge processing (or other writer operations) and can corrupt the dictionary. Keep HandleContext.Free() within the same write-lock scope (or otherwise synchronize access to referenceTrackingHandles).
			rwLock.EnterWriteLock ();
			try {
				Remove (context);
			} finally {
				rwLock.ExitWriteLock ();
			}

			HandleContext.Free (ref context);
		}

@simonrozsival simonrozsival marked this pull request as draft February 10, 2026 20:45
@simonrozsival simonrozsival added the copilot `copilot-cli` or other AIs were used to author this label Feb 11, 2026
@simonrozsival simonrozsival force-pushed the dev/simonrozsival/managed-value-manager-synchronization branch from 9d2c568 to b5e7803 Compare February 11, 2026 13:56
@simonrozsival simonrozsival changed the title [CoreCLR/NativeAOT] Fix synchronization in ManagedValueManager [CoreCLR/NativeAOT] Simplify synchronization in ManagedValueManager Feb 11, 2026
…op referenceTrackingHandles dict

- Remove bridgeProcessingSemaphore: WaitForGCBridgeProcessing is now a no-op
  (any shared lock between bridge and managed threads causes 3-way deadlock)
- Embed GCHandle directly in HandleContext struct as IntPtr field
  (eliminates need for Dictionary<IntPtr,GCHandle> shared between threads)
- Move EnsureAllContextsAreOurs behind #if DEBUG
  (uses HashSet instead of the removed Dictionary)
- Net result: bridge thread and managed threads share NO synchronization primitives
- Remove CollectPeers() call from AddPeer (no longer needed on hot path)
- Schedule CollectPeers via Task.Run after FinishCrossReferenceProcessing
  (bridge thread must not take lock(RegisteredInstances) — deadlocks)
- Bridge thread enqueues collected contexts, thread pool thread drains them
Replace lock(RegisteredInstances) with a dedicated Lock instance.
Benchmark: neutral (30,580 vs 30,220 iterations in 2min stress test).
@simonrozsival simonrozsival force-pushed the dev/simonrozsival/managed-value-manager-synchronization branch 2 times, most recently from 120d59c to ca5bea5 Compare February 11, 2026 14:23
@simonrozsival simonrozsival force-pushed the dev/simonrozsival/managed-value-manager-synchronization branch from ca5bea5 to 185d0b3 Compare February 11, 2026 14:24
@simonrozsival
Copy link
Member Author

I had to make more changes because the ReadWriteLockSlim was causing deadlocks.

In its final state, this PR just simplifies the existing code and cleans it up. The most significant change is removing the SemaphoreSlim bridgeProcessingSemaphore which adds unnecessary overhead and does not prevent any race conditions.

@simonrozsival simonrozsival marked this pull request as ready for review February 11, 2026 14:54
Comment on lines 455 to 444
bridgeProcessingSemaphore.Release ();
// Schedule cleanup of _registeredInstances on a thread pool thread.
// The bridge thread must not take lock(_registeredInstances) — see deadlock notes.
Task.Run (GetOrCreateInstance ().CollectPeers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this call GetOrCreateInstance()? Is there a way to do the equivalent of Get() and return early if none found?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public static ManagedValueManager GetOrCreateInstance () => s_instance.Value; actually we might rename this to be Instance to help confusion in the future.

@jonathanpeppers
Copy link
Member

I think there is a test failure worth looking into:

Mono.Android.NET_Tests, Java.InteropTests.JavaObjectTest.Dispose_Finalized / Release
System.AggregateException : AggregateException_ctor_DefaultMessage (  Expected: True
But was:  False
)
----> NUnit.Framework.AssertionException :   Expected: True
But was:  False

@simonrozsival
Copy link
Member Author

@copilot address Jonathan's reveiw comments. I have struggled with the Dispose_Finalized test originally when I was implementing the GC bridge, so this is likely due to the changes to CollectPeers or some deferred disposal of handles/contexts. Analyze this test and give me your best reasoning of the usefulness of this test. Should we make sure this test passes as is? Should we modify the test to reflect the changes to ManagedValueManager?

Copy link
Contributor

Copilot AI commented Feb 12, 2026

@simonrozsival I've opened a new pull request, #10806, to work on those changes. Once the pull request is ready, I'll request review from you.

…GCBridgeProcessing (#10806)

* Initial plan

* Make WaitForGCBridgeProcessing synchronously drain pending contexts

Fixes test failure in Dispose_Finalized by ensuring WaitForGCBridgeProcessing
calls CollectPeers() to drain any contexts enqueued during bridge processing.

Co-authored-by: simonrozsival <374616+simonrozsival@users.noreply.github.com>

* Use TryGetInstance in bridge processing to avoid creating instance

Add TryGetInstance() helper that checks IsValueCreated before accessing Value.
Use it in BridgeProcessingFinished and ProcessCollectedContexts to avoid
creating ManagedValueManager instance unnecessarily during bridge processing.

Co-authored-by: simonrozsival <374616+simonrozsival@users.noreply.github.com>

* Rename TryGetInstance to GetInstanceIfCreated and clarify comments

Follow conventional naming: GetInstanceIfCreated (returns null) vs Try* (returns bool with out param).
Clarify comments to reflect defensive checking rather than invariants.

Co-authored-by: simonrozsival <374616+simonrozsival@users.noreply.github.com>

* Rename GetOrCreateInstance to Instance property and make WaitForGCBridgeProcessing a no-op

- Replace GetOrCreateInstance() method with Instance property getter for cleaner singleton pattern
- Update JNIEnvInit.cs to use the new Instance property
- Make WaitForGCBridgeProcessing() a no-op (remove CollectPeers call)
- Update JavaObjectTest.WaitForGC() to explicitly call CollectPeers() after WaitForGCBridgeProcessing()

Note: JavaObjectTest.cs change is in external/Java.Interop submodule and will need to be
committed separately in the Java.Interop repository.

Co-authored-by: simonrozsival <374616+simonrozsival@users.noreply.github.com>

* Remove GetInstanceIfCreated and use Instance everywhere

Remove the defensive null checks in bridge processing. If bridge processing
is being called, the instance must already exist since objects need to be
registered for bridge processing to happen.

Co-authored-by: simonrozsival <374616+simonrozsival@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: simonrozsival <374616+simonrozsival@users.noreply.github.com>
@jonathanpeppers
Copy link
Member

There must be one spot using the old name:

/Users/runner/work/1/s/android/src/Microsoft.Android.Runtime.NativeAOT/Java.Interop/JreRuntime.cs(64,60): error CS0117: 'ManagedValueManager' does not contain a definition for 'GetOrCreateInstance'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

copilot `copilot-cli` or other AIs were used to author this

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CoreCLR] Investigate potential synchronization issues in the GC Bridge (ManagedValueManager)

4 participants