Skip to content

fast-get: fix partition leak for multi-thread usage#10606

Open
tmleman wants to merge 1 commit intothesofproject:mainfrom
tmleman:topic/upstream/pr/fast_get/fix/partition_leak
Open

fast-get: fix partition leak for multi-thread usage#10606
tmleman wants to merge 1 commit intothesofproject:mainfrom
tmleman:topic/upstream/pr/fast_get/fix/partition_leak

Conversation

@tmleman
Copy link
Contributor

@tmleman tmleman commented Mar 6, 2026

Each thread that calls fast_get() on a large buffer gets a partition added to its memory domain. Previously, fast_put() only removed the partition from the allocating thread's domain (entry->thread), leaking partitions for any additional threads that were granted access.

Fix by having each thread remove its OWN partition on fast_put() using k_current_get() instead of entry->thread. The partition removal now happens unconditionally (not just on last reference), ensuring each thread cleans up after itself.

Order of operations:

  1. Free buffer if last reference (while partition still grants access)
  2. Remove current thread's partition (prevents leaks)
  3. Clear entry if last reference

Each thread that calls fast_get() on a large buffer gets a partition
added to its memory domain. Previously, fast_put() only removed the
partition from the allocating thread's domain (entry->thread), leaking
partitions for any additional threads that were granted access.

Fix by having each thread remove its OWN partition on fast_put() using
k_current_get() instead of entry->thread. The partition removal now
happens unconditionally (not just on last reference), ensuring each
thread cleans up after itself.

Order of operations:
1. Free buffer if last reference (while partition still grants access)
2. Remove current thread's partition (prevents leaks)
3. Clear entry if last reference

Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix a userspace memory-domain partition leak when multiple threads call fast_get() on large buffers by ensuring partitions are removed on fast_put() for all participating threads/domains.

Changes:

  • Adjust fast_put() to free the SRAM buffer when the last reference is released.
  • Change partition removal logic to remove a large-buffer partition from the current thread’s memory domain (instead of entry->thread), and do so outside the “last ref” block.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +288 to +303
#if CONFIG_USERSPACE
if (entry->size > FAST_GET_MAX_COPY_SIZE) {
struct k_mem_partition part = {
.start = (uintptr_t)entry->sram_ptr,
.size = ALIGN_UP(entry->size, CONFIG_MM_DRV_PAGE_SIZE),
.attr = K_MEM_PARTITION_P_RO_U_RO | XTENSA_MMU_CACHED_WB,
};
struct k_mem_domain *domain = k_current_get()->mem_domain_info.mem_domain;

LOG_DBG("removing partition %p size %#zx from thread %p",
(void *)part.start, part.size, k_current_get());
int err = k_mem_domain_remove_partition(domain, &part);

if (err)
LOG_WRN("partition removal failed: %d", err);
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partition removal is now executed on every fast_put() for large buffers, even when entry->refcount remains > 0. This breaks fast_get()/fast_put() reference counting: a thread can call fast_get() multiple times and still expect access after only some puts, but the first put would revoke access by removing the partition while the buffer is still in use. Partition removal needs to be tied to the last reference for the relevant owner (e.g., per-thread/per-domain refcount), not performed unconditionally on each put.

Copilot uses AI. Check for mistakes.
Comment on lines +294 to +300
};
struct k_mem_domain *domain = k_current_get()->mem_domain_info.mem_domain;

LOG_DBG("removing partition %p size %#zx from thread %p",
(void *)part.start, part.size, k_current_get());
int err = k_mem_domain_remove_partition(domain, &part);

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k_mem_domain_remove_partition() operates on a memory domain, which can be shared by multiple threads. Removing the partition from k_current_get()->mem_domain_info.mem_domain can therefore revoke access for other threads in the same domain that still hold references (entry->refcount > 0). To avoid use-after-revoke faults, this code should track granted partitions per-domain (and only remove when the last user in that domain releases), or otherwise enforce/document that each fast_get() user has a unique mem_domain.

Copilot uses AI. Check for mistakes.
Comment on lines +288 to +303
#if CONFIG_USERSPACE
if (entry->size > FAST_GET_MAX_COPY_SIZE) {
struct k_mem_partition part = {
.start = (uintptr_t)entry->sram_ptr,
.size = ALIGN_UP(entry->size, CONFIG_MM_DRV_PAGE_SIZE),
.attr = K_MEM_PARTITION_P_RO_U_RO | XTENSA_MMU_CACHED_WB,
};
struct k_mem_domain *domain = k_current_get()->mem_domain_info.mem_domain;

LOG_DBG("removing partition %p size %#zx from thread %p",
(void *)part.start, part.size, k_current_get());
int err = k_mem_domain_remove_partition(domain, &part);

if (err)
LOG_WRN("partition removal failed: %d", err);
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change targets CONFIG_USERSPACE behavior (partition add/remove), but existing fast_get/fast_put unit tests don’t appear to exercise userspace + large-buffer paths (e.g., multi-thread, partition add/remove, and the intended leak prevention). Consider adding a Zephyr ztest that enables CONFIG_USERSPACE and validates that: (1) multiple gets/puts from the same thread keep access until the final put, and (2) multiple threads (potentially in different domains) don’t leak partitions after puts.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants