News: 0001543605

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Bytedance Proposes Faster Linux Inter-Process Communication With "Run Process As Library"

([Linux Kernel] 3 Hours Ago Linux RPAL)


Bytedance engineers are exploring faster inter-process communication (IPC) on Linux via a new approach they call Run Process As Library (RPAL). Their initial benchmarks of RPAL are very promising for faster Linux IPC performance.

Bytedance has been exploring faster Linux IPC while needing minimal modifications at the application level. Run Process As Library "RPAL" comes down to a framework to allow one process to invoke another as if making a local function call and able to bypass going through the Linux kernel.

The core RPAL objectives come down to:

"1. Data-plane efficiency: Reduce the number of data copies from two (in the shared memory solution) to one.

2. Control-plane optimization: Eliminate the overhead of system calls and kernel's thread switches.

3. Application compatibility: Minimize the modifications to existing applications that utilize Unix domain sockets and the epoll() family."

For most readers just caring about the net benefit, RPAL is looking pretty wild:

"During testing, the client transmitted 1 million 32-byte messages, and we computed the per-message average latency. The results are as follows:

*****************

Without RPAL: Message length: 32 bytes, Total TSC cycles: 19616222534,

Message count: 1000000, Average latency: 19616 cycles

With RPAL: Message length: 32 bytes, Total TSC cycles: 1703459326,

Message count: 1000000, Average latency: 1703 cycles

*****************

These results confirm that RPAL delivers substantial latency improvements over the current epoll implementation—achieving a 17,913-cycle reduction (an ~91.3% improvement) for 32-byte messages.

We have applied RPAL to an RPC framework that is widely used in our data center. With RPAL, we have successfully achieved up to 15.5% reduction in the CPU utilization of processes in real-world microservice scenario. The gains primarily stem from minimizing control plane overhead through the utilization of userspace context switches. Additionally, by leveraging address space sharing, the number of memory copies is significantly reduced."

RPAL does depend upon Intel Memory Protection Key (MPK) hardware support with newer processors or on the AMD side MPK support is found just with Zen 4 processors and newer. Later patches from Bytedance may allow RPAL use without MPK processor support.

Those interested in this promising work for faster Linux IPC performance can see [1]this RFC patch series with the initial public patches around this Run Process As Library feature.



[1] https://lore.kernel.org/lkml/CAP2HCOmAkRVTci0ObtyW=3v6GFOrt9zCn2NwLUbZ+Di49xkBiw@mail.gmail.com/



DarkCloud

Errinwright

toves

nekomachi-touge

flower

Vatto

[email protected]

pokeballs

cb88

The rolling stones concert down the road caused a brown out