blk-cgroup: fix races related to blkg_list iteration by blktests-ci[bot] · Pull Request #615 · linux-blktests/linux-block

blktests-ci · 2026-03-04T07:59:05Z

Pull request for series with
subject: blk-cgroup: fix races related to blkg_list iteration
version: 3
url: https://patchwork.kernel.org/project/linux-block/list/?series=1061077

blktests-ci · 2026-03-04T07:59:06Z

Upstream branch: af4e9ef
series: https://patchwork.kernel.org/project/linux-block/list/?series=1061077
version: 3

blktests-ci · 2026-03-04T09:42:23Z

Upstream branch: 0031c06
series: https://patchwork.kernel.org/project/linux-block/list/?series=1061077
version: 3

blktests-ci · 2026-03-04T20:04:45Z

Upstream branch: ecc64d2
series: https://patchwork.kernel.org/project/linux-block/list/?series=1061077
version: 3

blktests-ci · 2026-03-05T12:27:23Z

Upstream branch: c107785
series: https://patchwork.kernel.org/project/linux-block/list/?series=1061077
version: 3

blktests-ci · 2026-03-05T21:44:26Z

Upstream branch: 5ee8dbf
series: https://patchwork.kernel.org/project/linux-block/list/?series=1061077
version: 3

blktests-ci · 2026-03-10T06:33:34Z

Upstream branch: 1f318b9
series: https://patchwork.kernel.org/project/linux-block/list/?series=1061077
version: 3

blktests-ci · 2026-03-11T08:07:11Z

Upstream branch: None
series: https://patchwork.kernel.org/project/linux-block/list/?series=1061077
version: 3

… blkcg_mutex blkg_destroy_all() iterates q->blkg_list without holding blkcg_mutex, which can race with blkg_free_workfn() that removes blkgs from the list while holding blkcg_mutex. Add blkcg_mutex protection around the q->blkg_list iteration to prevent potential list corruption or use-after-free issues. Signed-off-by: Yu Kuai <yukuai@fnnas.com>

…mutex bfq_end_wr_async() iterates q->blkg_list while only holding bfqd->lock, but not blkcg_mutex. This can race with blkg_free_workfn() that removes blkgs from the list while holding blkcg_mutex. Add blkcg_mutex protection in bfq_end_wr() before taking bfqd->lock to ensure proper synchronization when iterating q->blkg_list. Signed-off-by: Yu Kuai <yukuai@fnnas.com>

When switching an IO scheduler on a block device, blkcg_activate_policy() allocates blkg_policy_data (pd) for all blkgs attached to the queue. However, blkcg_activate_policy() may race with concurrent blkcg deletion, leading to use-after-free and memory leak issues. The use-after-free occurs in the following race: T1 (blkcg_activate_policy): - Successfully allocates pd for blkg1 (loop0->queue, blkcgA) - Fails to allocate pd for blkg2 (loop0->queue, blkcgB) - Enters the enomem rollback path to release blkg1 resources T2 (blkcg deletion): - blkcgA is deleted concurrently - blkg1 is freed via blkg_free_workfn() - blkg1->pd is freed T1 (continued): - Rollback path accesses blkg1->pd->online after pd is freed - Triggers use-after-free In addition, blkg_free_workfn() frees pd before removing the blkg from q->blkg_list. This allows blkcg_activate_policy() to allocate a new pd for a blkg that is being destroyed, leaving the newly allocated pd unreachable when the blkg is finally freed. Fix these races by extending blkcg_mutex coverage to serialize blkcg_activate_policy() rollback and blkg destruction, ensuring pd lifecycle is synchronized with blkg list visibility. Link: https://lore.kernel.org/all/20260108014416.3656493-3-zhengqixing@huaweicloud.com/ Fixes: f1c006f ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()") Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com>

When switching IO schedulers on a block device, blkcg_activate_policy() can race with concurrent blkcg deletion, leading to a use-after-free in rcu_accelerate_cbs. T1: T2: blkg_destroy kill(&blkg->refcnt) // blkg->refcnt=1->0 blkg_release // call_rcu(__blkg_release) ... blkg_free_workfn ->pd_free_fn(pd) elv_iosched_store elevator_switch ... iterate blkg list blkg_get(blkg) // blkg->refcnt=0->1 list_del_init(&blkg->q_node) blkg_put(pinned_blkg) // blkg->refcnt=1->0 blkg_release // call_rcu again rcu_accelerate_cbs // uaf Fix this by checking hlist_unhashed(&blkg->blkcg_node) before getting a reference to the blkg. This is the same check used in blkg_destroy() to detect if a blkg has already been destroyed. If the blkg is already unhashed, skip processing it since it's being destroyed. Link: https://lore.kernel.org/all/20260108014416.3656493-4-zhengqixing@huaweicloud.com/ Fixes: f1c006f ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()") Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com>

Move the teardown sequence which offlines and frees per-policy blkg_policy_data (pd) into a helper for readability. No functional change intended. Signed-off-by: Zheng Qixing <zhengqixing@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Yu Kuai <yukuai@fnnas.com> Signed-off-by: Yu Kuai <yukuai@fnnas.com>

…cy() Some policies like iocost and iolatency perform percpu allocation in pd_alloc_fn(). Percpu allocation with queue frozen can cause deadlock because percpu memory reclaim may issue IO. Now that q->blkg_list is protected by blkcg_mutex, restructure blkcg_activate_policy() to allocate all pds before freezing the queue: 1. Allocate all pds with GFP_KERNEL before freezing the queue 2. Freeze the queue 3. Initialize and online all pds Note: Future work is to remove all queue freezing before blkcg_activate_policy() to fix the deadlocks thoroughly. Signed-off-by: Yu Kuai <yukuai@fnnas.com>

The current rq_qos_mutex handling has an awkward pattern where callers must acquire the mutex before calling rq_qos_add()/rq_qos_del(), and blkg_conf_open_bdev_frozen() had to release and re-acquire the mutex around queue freezing to maintain proper locking order (freeze queue before mutex). On the other hand, with rq_qos_mutex held after blkg_conf_prep(), there are many possible deadlocks: - allocating memory with GFP_KERNEL, like blk_throtl_init(); - allocating percpu memory, like pd_alloc_fn() for iocost/iolatency; This patch refactors the locking by: 1. Moving queue freeze and rq_qos_mutex acquisition inside rq_qos_add()/rq_qos_del(), with the correct order: freeze first, then acquire mutex. 2. Removing external mutex handling from wbt_init() since rq_qos_add() now handles it internally. 3. Removing rq_qos_mutex handling from blkg_conf_open_bdev() entirely, making it only responsible for parsing MAJ:MIN and opening the bdev. 4. Removing blkg_conf_open_bdev_frozen() and blkg_conf_exit_frozen() functions which are no longer needed. 5. Updating ioc_qos_write() to use the simpler blkg_conf_open_bdev() and blkg_conf_exit() functions. This eliminates the release-and-reacquire pattern and makes rq_qos_add()/rq_qos_del() self-contained, which is cleaner and reduces complexity. Each function now properly manages its own locking with the correct order: queue freeze → mutex acquire → modify → mutex release → queue unfreeze. Signed-off-by: Yu Kuai <yukuai@fnnas.com>

blktests-ci bot added new linus-master V3 labels Mar 4, 2026

blktests-ci bot force-pushed the linus-master_base branch from d0e1bed to 6b51c57 Compare March 4, 2026 09:34

blktests-ci bot force-pushed the series/1061077=>linus-master branch from 5a8adaf to 550a1b4 Compare March 4, 2026 09:42

blktests-ci bot force-pushed the linus-master_base branch from 6b51c57 to 78036b2 Compare March 4, 2026 19:57

blktests-ci bot force-pushed the series/1061077=>linus-master branch from 550a1b4 to 9c2054a Compare March 4, 2026 20:04

blktests-ci bot added the V3-ci-fail label Mar 5, 2026

blktests-ci bot force-pushed the linus-master_base branch from 78036b2 to bbb3394 Compare March 5, 2026 12:20

blktests-ci bot force-pushed the series/1061077=>linus-master branch from 9c2054a to 5269d6b Compare March 5, 2026 12:27

blktests-ci bot force-pushed the linus-master_base branch from bbb3394 to 901a429 Compare March 5, 2026 21:37

blktests-ci bot force-pushed the series/1061077=>linus-master branch from 5269d6b to 4c7632e Compare March 5, 2026 21:44

blktests-ci bot force-pushed the linus-master_base branch from 901a429 to 1f19ba6 Compare March 10, 2026 06:29

blktests-ci bot force-pushed the series/1061077=>linus-master branch from 4c7632e to 0e9f175 Compare March 10, 2026 06:33

blktests-ci bot force-pushed the linus-master_base branch from 1f19ba6 to e79276a Compare March 11, 2026 08:02

hailan94 and others added 7 commits March 11, 2026 17:07

blktests-ci bot force-pushed the series/1061077=>linus-master branch from 0e9f175 to 7388f0b Compare March 11, 2026 08:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blk-cgroup: fix races related to blkg_list iteration#615

blk-cgroup: fix races related to blkg_list iteration#615
blktests-ci[bot] wants to merge 7 commits intolinus-master_basefrom
series/1061077=>linus-master

blktests-ci bot commented Mar 4, 2026

Uh oh!

blktests-ci bot commented Mar 4, 2026

Uh oh!

blktests-ci bot commented Mar 4, 2026

Uh oh!

blktests-ci bot commented Mar 4, 2026

Uh oh!

blktests-ci bot commented Mar 5, 2026

Uh oh!

blktests-ci bot commented Mar 5, 2026

Uh oh!

blktests-ci bot commented Mar 10, 2026

Uh oh!

blktests-ci bot commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

blktests-ci bot commented Mar 4, 2026

Uh oh!

blktests-ci bot commented Mar 4, 2026

Uh oh!

blktests-ci bot commented Mar 4, 2026

Uh oh!

blktests-ci bot commented Mar 4, 2026

Uh oh!

blktests-ci bot commented Mar 5, 2026

Uh oh!

blktests-ci bot commented Mar 5, 2026

Uh oh!

blktests-ci bot commented Mar 10, 2026

Uh oh!

blktests-ci bot commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants