News: 0001498457

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

NVIDIA Posts Linux Patches For GPU Direct RDMA For Device Private Pages

([NVIDIA] 88 Minutes Ago GPU P2PDMA For Device Private Pages)


Building off the existing Linux support for GPU Direct RDMA / Peer-To-Peer DMA functionality, a set of patches were posted by NVIDIA today enabling this P2P DMA support to also work for device-private pages.

There's been a lot of Linux work done for [1]P2PDMA for avoiding system memory copies between devices/accelerators. The patches posted this afternoon are focused on providing GPU Direct RDMA (P2P DMA) for device private memory pages.

[2]

NVIDIA engineer Yonatan Maman explained in the patch series:

"This patch series aims to enable Peer-to-Peer (P2P) DMA access in GPU-centric applications that utilize RDMA and private device pages. This enhancement is crucial for minimizing data transfer overhead by allowing the GPU to directly expose device private page data to devices such as NICs, eliminating the need to traverse system RAM, which is the native method for exposing device private page data."

Besides the infrastructure changes to the Linux memory management code and Heterogeneous Memory Management (HMM), the patch series also adds code to the NVIDIA Mellanox MLX5 driver for optimizing PCIe peer-to-peer private device pages and then also to the Nouveau driver for P2P DMA support. The Nouveau driver support then allows to "handle P2P page operations seamlessly" -- not that you'd really be using the Nouveau open-source DRM driver for any very demanding workloads at this point. Adding it to Nouveau is done for demonstrating the functionality using fully open-source drivers and to demonstrate an open-source user per upstream kernel policies.

See [3]this patch series for the GPU Direct RDMA proposal for device private pages.



[1] https://www.phoronix.com/search/P2PDMA

[2] https://www.phoronix.com/image-viewer.php?id=2024&image=gpu_direct_private_lrg

[3] https://lore.kernel.org/dri-devel/20241015152348.3055360-1-ymaman@nvidia.com/



phoronix

Q: How many IBM CPU's does it take to do a logical right shift?
A: 33. 1 to hold the bits and 32 to push the register.