Sync redfs rhel9 5 503.40.1 to redfs-ubuntu-noble-6.8.0-58.60 by bsbernd · Pull Request #98 · DDNStorage/linux

bsbernd · 2026-02-14T14:33:19Z

Based on

Missing from redfs-rhel9_5-503.40.1
    32e0073d (Bernd Schubert) fuse: {io-uring} Prefer the current core over mapping
    f6de786c (Jingbo Xu) fuse: invalidate the page cache after direct write
  * 1607a036 (Horst Birthelmer) fuse: simplify compound commands
  * f18c61e7 (Horst Birthelmer) fuse: avoid tmp copying of data for writeback pages
    ade0d22c (Matthew Wilcox (Oracle)) fuse: Remove fuse_writepage
    461c4ed7 (Horst Birthelmer) Revert "fuse: avoid tmp copying of data for writeback pages"
    5e590a65 (Jingbo Xu) fuse: make foffset alignment opt-in for optimum backend performance

Missing from redfs-ubuntu-noble-6.8.0-58.60
  * e3649d44 (Horst Birthelmer) fuse: simplify compound commands
    cb56803b (Feng Shuo) Fix the compiling error on aarch64

Sometimes the file offset alignment needs to be opt-in to achieve the optimum performance at the backend store. For example when ErasureCode [1] is used at the backend store, the optimum write performance is achieved when the WRITE request is aligned with the stripe size of ErasureCode. Otherwise a non-aligned WRITE request needs to be split at the stripe size boundary. It is quite costly to handle these split partial requests, as firstly the whole stripe to which the split partial request belongs needs to be read out, then overwrite the read stripe buffer with the request, and finally write the whole stripe back to the persistent storage. Thus the backend store can suffer severe performance degradation when WRITE requests can not fit into one stripe exactly. The write performance can be 10x slower when the request is 256KB in size given 4MB stripe size. Also there can be 50% performance degradation in theory if the request is not stripe boundary aligned. Besides, the conveyed test indicates that, the non-alignment issue becomes more severe when decreasing fuse's max_ratio, maybe partly because the background writeback now is more likely to run parallelly with the dirtier. fuse's max_ratio ratio of aligned WRITE requests ---------------- ------------------------------- 70 99.9% 40 74% 20 45% 10 20% With the patched version, which makes the alignment constraint opt-in when constructing WRITE requests, the ratio of aligned WRITE requests increases to 98% (previously 20%) when fuse's max_ratio is 10. fuse: fix alignment to work with redfs ubuntu - small fix to make the fuse alignment patch work with redfs ubuntu 6.8.x - add writeback_control to fuse_writepage_need_send() to make more accurate decisions about when to skip sending data - fix shift number for FUSE_ALIGN_PG_ORDER - remove test code [1] https://lore.kernel.org/linux-fsdevel/20240124070512.52207-1-jefflexu@linux.alibaba.com/T/#m9bce469998ea6e4f911555c6f7be1e077ce3d8b4 Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com> (cherry picked from commit 5e590a6)

This reverts commit 114c4df. (cherry picked from commit 461c4ed)

The writepage operation is deprecated as it leads to worse performance under high memory pressure due to folios being written out in LRU order rather than sequentially within a file. Use filemap_migrate_folio() to support dirty folio migration instead of writepage. Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> (cherry picked from commit ade0d22)

When writing back pages while using writeback caching the code did a copy of data into temporary pages to avoid a deadlock in reclaiming of memory. This is an adaptation and backport of a patch by Joanne Koong joannelkoong@gmail.com. Since we use pinned memory with io_uring we don't need the temporary copies and we don't use the AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM flag in the pagemap. Link: https://www.spinics.net/lists/linux-mm/msg407405.html Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com> (cherry picked from commit f18c61e)

This fixes xfstests generic/451 (for both O_DIRECT and FOPEN_DIRECT_IO direct write). Commit b359af8 ("fuse: Invalidate the page cache after FOPEN_DIRECT_IO write") tries to fix the similar issue for FOPEN_DIRECT_IO write, which can be reproduced by xfstests generic/209. It only fixes the issue for synchronous direct write, while omitting the case for asynchronous direct write (exactly targeted by generic/451). While for O_DIRECT direct write, it's somewhat more complicated. For synchronous direct write, generic_file_direct_write() will invalidate the page cache after the write, and thus it can pass generic/209. While for asynchronous direct write, the invalidation in generic_file_direct_write() is bypassed since the invalidation shall be done when the asynchronous IO completes. This is omitted in FUSE and generic/451 fails whereby. Fix this by conveying the invalidation for both synchronous and asynchronous write. - with FOPEN_DIRECT_IO - sync write, invalidate in fuse_send_write() - async write, invalidate in fuse_aio_complete() with FUSE_ASYNC_DIO, fuse_send_write() otherwise - without FOPEN_DIRECT_IO - sync write, invalidate in generic_file_direct_write() - async write, invalidate in fuse_aio_complete() with FUSE_ASYNC_DIO, generic_file_direct_write() otherwise Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> (cherry picked from commit f6de786)

Mapping might point to a totally different core due to random assignment. For performance using the current core might be beneficial Example (with core binding) unpatched WRITE: bw=841MiB/s patched WRITE: bw=1363MiB/s With fio --name=test --ioengine=psync --direct=1 \ --rw=write --bs=1M --iodepth=1 --numjobs=1 \ --filename_format=/redfs/testfile.\$jobnum --size=100G \ --thread --create_on_open=1 --runtime=30s --cpus_allowed=1 In order to get the good number `--cpus_allowed=1` is needed. This could be improved by a future change that avoids cpu migration in fuse_request_end() on wake_up() call. (cherry picked from commit 32e0073)

bsbernd · 2026-02-14T14:34:20Z

Remaining differences

bschubert2@imesrv6 linux.git>compare-branches.py -i .github -i configs --b2 ec36c4552148 sync-redfs-rhel9_5-503.40.1 redfs-ubuntu-noble-6.8.0-58.60
Missing from sync-redfs-rhel9_5-503.40.1
  * 1607a036 (Horst Birthelmer) fuse: simplify compound commands

Missing from redfs-ubuntu-noble-6.8.0-58.60
  * e3649d44 (Horst Birthelmer) fuse: simplify compound commands
    cb56803b (Feng Shuo) Fix the compiling error on aarch64

compound backport has the wrong cherry-pick ID again.

lostjeffle and others added 6 commits February 14, 2026 15:30

Revert "fuse: avoid tmp copying of data for writeback pages"

a268624

This reverts commit 114c4df. (cherry picked from commit 461c4ed)

bsbernd requested review from cding-ddn, hbirth and yongzech February 14, 2026 14:34

bsbernd merged commit 8faf07d into redfs-rhel9_5-503.40.1 Feb 14, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync redfs rhel9 5 503.40.1 to redfs-ubuntu-noble-6.8.0-58.60#98

Sync redfs rhel9 5 503.40.1 to redfs-ubuntu-noble-6.8.0-58.60#98
bsbernd merged 6 commits intoredfs-rhel9_5-503.40.1from
sync-redfs-rhel9_5-503.40.1

bsbernd commented Feb 14, 2026

Uh oh!

bsbernd commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bsbernd commented Feb 14, 2026

Uh oh!

bsbernd commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants