Skip to content

Sync redfs rhel9 5 503.40.1 to redfs-ubuntu-noble-6.8.0-58.60#98

Merged
bsbernd merged 6 commits intoredfs-rhel9_5-503.40.1from
sync-redfs-rhel9_5-503.40.1
Feb 14, 2026
Merged

Sync redfs rhel9 5 503.40.1 to redfs-ubuntu-noble-6.8.0-58.60#98
bsbernd merged 6 commits intoredfs-rhel9_5-503.40.1from
sync-redfs-rhel9_5-503.40.1

Conversation

@bsbernd
Copy link
Collaborator

@bsbernd bsbernd commented Feb 14, 2026

Based on

Missing from redfs-rhel9_5-503.40.1
    32e0073d (Bernd Schubert) fuse: {io-uring} Prefer the current core over mapping
    f6de786c (Jingbo Xu) fuse: invalidate the page cache after direct write
  * 1607a036 (Horst Birthelmer) fuse: simplify compound commands
  * f18c61e7 (Horst Birthelmer) fuse: avoid tmp copying of data for writeback pages
    ade0d22c (Matthew Wilcox (Oracle)) fuse: Remove fuse_writepage
    461c4ed7 (Horst Birthelmer) Revert "fuse: avoid tmp copying of data for writeback pages"
    5e590a65 (Jingbo Xu) fuse: make foffset alignment opt-in for optimum backend performance

Missing from redfs-ubuntu-noble-6.8.0-58.60
  * e3649d44 (Horst Birthelmer) fuse: simplify compound commands
    cb56803b (Feng Shuo) Fix the compiling error on aarch64

lostjeffle and others added 6 commits February 14, 2026 15:30
Sometimes the file offset alignment needs to be opt-in to achieve the
optimum performance at the backend store.

For example when ErasureCode [1] is used at the backend store, the
optimum write performance is achieved when the WRITE request is aligned
with the stripe size of ErasureCode.  Otherwise a non-aligned WRITE
request needs to be split at the stripe size boundary.  It is quite
costly to handle these split partial requests, as firstly the whole
stripe to which the split partial request belongs needs to be read out,
then overwrite the read stripe buffer with the request, and finally write
the whole stripe back to the persistent storage.

Thus the backend store can suffer severe performance degradation when
WRITE requests can not fit into one stripe exactly.  The write performance
can be 10x slower when the request is 256KB in size given 4MB stripe size.
Also there can be 50% performance degradation in theory if the request
is not stripe boundary aligned.

Besides, the conveyed test indicates that, the non-alignment issue
becomes more severe when decreasing fuse's max_ratio, maybe partly
because the background writeback now is more likely to run parallelly
with the dirtier.

fuse's max_ratio	ratio of aligned WRITE requests
----------------	-------------------------------
70			99.9%
40			74%
20			45%
10			20%

With the patched version, which makes the alignment constraint opt-in
when constructing WRITE requests, the ratio of aligned WRITE requests
increases to 98% (previously 20%) when fuse's max_ratio is 10.

fuse: fix alignment to work with redfs ubuntu

- small fix to make the fuse alignment patch work with redfs ubuntu 6.8.x
- add writeback_control to fuse_writepage_need_send() to make
more accurate decisions about when to skip sending data
- fix shift number for FUSE_ALIGN_PG_ORDER
- remove test code

[1] https://lore.kernel.org/linux-fsdevel/20240124070512.52207-1-jefflexu@linux.alibaba.com/T/#m9bce469998ea6e4f911555c6f7be1e077ce3d8b4
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
(cherry picked from commit 5e590a6)
This reverts commit 114c4df.

(cherry picked from commit 461c4ed)
The writepage operation is deprecated as it leads to worse performance
under high memory pressure due to folios being written out in LRU order
rather than sequentially within a file.  Use filemap_migrate_folio() to
support dirty folio migration instead of writepage.

Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
(cherry picked from commit ade0d22)
When writing back pages while using writeback caching the code did a copy of data into
temporary pages to avoid a deadlock in reclaiming of memory.

This is an adaptation and backport of a patch by Joanne Koong joannelkoong@gmail.com.

Since we use pinned memory with io_uring we don't need the temporary copies
and we don't use the AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM flag in the pagemap.

Link: https://www.spinics.net/lists/linux-mm/msg407405.html
Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
(cherry picked from commit f18c61e)
This fixes xfstests generic/451 (for both O_DIRECT and FOPEN_DIRECT_IO
direct write).

Commit b359af8 ("fuse: Invalidate the page cache after
FOPEN_DIRECT_IO write") tries to fix the similar issue for
FOPEN_DIRECT_IO write, which can be reproduced by xfstests generic/209.
It only fixes the issue for synchronous direct write, while omitting
the case for asynchronous direct write (exactly targeted by
generic/451).

While for O_DIRECT direct write, it's somewhat more complicated.  For
synchronous direct write, generic_file_direct_write() will invalidate
the page cache after the write, and thus it can pass generic/209.  While
for asynchronous direct write, the invalidation in
generic_file_direct_write() is bypassed since the invalidation shall be
done when the asynchronous IO completes.  This is omitted in FUSE and
generic/451 fails whereby.

Fix this by conveying the invalidation for both synchronous and
asynchronous write.

- with FOPEN_DIRECT_IO
  - sync write,  invalidate in fuse_send_write()
  - async write, invalidate in fuse_aio_complete() with FUSE_ASYNC_DIO,
		 fuse_send_write() otherwise
- without FOPEN_DIRECT_IO
  - sync write,  invalidate in generic_file_direct_write()
  - async write, invalidate in fuse_aio_complete() with FUSE_ASYNC_DIO,
		 generic_file_direct_write() otherwise

Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
(cherry picked from commit f6de786)
Mapping might point to a totally different core due to
random assignment. For performance using the current
core might be beneficial

Example (with core binding)

unpatched WRITE: bw=841MiB/s
patched   WRITE: bw=1363MiB/s

With
fio --name=test --ioengine=psync --direct=1 \
    --rw=write --bs=1M --iodepth=1 --numjobs=1 \
    --filename_format=/redfs/testfile.\$jobnum --size=100G \
    --thread --create_on_open=1 --runtime=30s --cpus_allowed=1

In order to get the good number `--cpus_allowed=1` is needed.
This could be improved by a future change that avoids
cpu migration in fuse_request_end() on wake_up() call.

(cherry picked from commit 32e0073)
@bsbernd
Copy link
Collaborator Author

bsbernd commented Feb 14, 2026

Remaining differences

bschubert2@imesrv6 linux.git>compare-branches.py -i .github -i configs --b2 ec36c4552148 sync-redfs-rhel9_5-503.40.1 redfs-ubuntu-noble-6.8.0-58.60
Missing from sync-redfs-rhel9_5-503.40.1
  * 1607a036 (Horst Birthelmer) fuse: simplify compound commands

Missing from redfs-ubuntu-noble-6.8.0-58.60
  * e3649d44 (Horst Birthelmer) fuse: simplify compound commands
    cb56803b (Feng Shuo) Fix the compiling error on aarch64

compound backport has the wrong cherry-pick ID again.

@bsbernd bsbernd merged commit 8faf07d into redfs-rhel9_5-503.40.1 Feb 14, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants