News: 0001504348

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Fresh Take On Linux Uncached Buffered I/O "RWF_UNCACHED" Nets 65~75% Improvement

([Linux Storage] 3 Hours Ago RWF_UNCACHED)


Linux I/O expert and block/IO_uring maintainer Jens Axboe of Meta has recently revisited his patches around uncached buffered I/O. Back in 2019 the "RWF_UNCACHED" effort was started by Axboe to address a throughput cliff in performance once the page cache fills up. That work faded away but Axboe recently took to crafting a set of fresh patches for implementing uncached buffered I/O and they are showing extremely promising results.

Jens Axboe posted today to [1]Twitter/X around his work on this uncached buffered I/O for Linux in 2024:

"Uncached buffered IO is back, after a 5 year hiatus. Simpler and cleaner now. Up to 65-75% improvement, at half the CPU usage on my system. And none of the nonsense of the unpredictability of the page cache."

These new patches are currently residing within his [2]buffered-uncached.2 Git branch . In there for [3]the patch adding the RWF_UNCACHED flag, he explains:

"Add RWF_UNCACHED as a read operation flag, which means that any data read wil be removed from the page cache upon completion. Uses the page cache to synchronize, and simply prunes folios that were instantiated when the operation completes.

...

You can think of uncached buffered IO as being the much more attractive cousing of O_DIRECT - it has none of the restrictions of O_DIRECT. Yes, it will copy the data, but unlike regular buffered IO, it doesn't run into the unpredictability of the page cache in terms of reclaim. As an example, on a test box with 32 drives, reading them with buffered IO looks as follows:

Reading bs 65536, uncached 0

1s: 145945MB/sec

2s: 158067MB/sec

3s: 157007MB/sec

4s: 148622MB/sec

5s: 118824MB/sec

6s: 70494MB/sec

7s: 41754MB/sec

8s: 90811MB/sec

9s: 92204MB/sec

10s: 95178MB/sec

11s: 95488MB/sec

12s: 95552MB/sec

13s: 96275MB/sec

where it's quite easy to see where the page cache filled up, and performance went from good to erratic, and finally settles at a much lower rate.

...

If the same test case is run with RWF_UNCACHED set for the buffered read, the output looks as follows:

Reading bs 65536, uncached 0

1s: 153144MB/sec

2s: 156760MB/sec

3s: 158110MB/sec

4s: 158009MB/sec

5s: 158043MB/sec

6s: 157638MB/sec

7s: 157999MB/sec

8s: 158024MB/sec

9s: 157764MB/sec

10s: 157477MB/sec

11s: 157417MB/sec

12s: 157455MB/sec

13s: 157233MB/sec

14s: 156692MB/sec

which is just chugging along at ~155GB/sec of read performance.

...

where just the test app is using CPU, no reclaim is taking place outside of the main thread. Not only is performance 65% better, it's also using half the CPU to do it."

Now that is a beautiful win.

Another [4]patch adds support for RWF_UNCACHED for buffered writes:

"If RWF_UNCACHED is set for a write, mark the folios being written with drop_writeback. Then writeback completion will drop the pages. The write_iter handler simply kicks off writeback for the pages, and writeback completion will take care of the rest...the behavior is fully predictable, performing the same throughout even after the page cache would otherwise have fully filled with dirty data. It's also about 75% faster, and using half the CPU of the system compared to the normal buffered write."

That's some really great work that will hopefully make it to the mainline Linux kernel with these very exciting results.



[1] https://x.com/axboe/status/1854244924633252074?s=09

[2] https://git.kernel.dk/cgit/linux/log/?h=buffered-uncached.2

[3] https://git.kernel.dk/cgit/linux/commit/?h=buffered-uncached.2&id=a29e99bd45b3e99dc13e3a8b245435a86a1afe55

[4] https://git.kernel.dk/cgit/linux/commit/?h=buffered-uncached.2&id=3e4915125ca07d5f22d165ebe041a3e9713acae2



Danny3

jeisom

Asynchronous inputs are at the root of our race problems.
-- D. Winker and F. Prosser