NVGRACE-GPU VFIO Driver Preparing For NVIDIA Grace Blackwell
([NVIDIA] 5 Hours Ago
NVIDIA Grace Blackwell VFIO Support)
- Reference: 0001496567
- News link: https://www.phoronix.com/news/NVGRACE-GPU-GB-Blackwell
- Source link:
The NVGRACE-GPU VFIO driver was introduced for handling Virtual Function I/O support with the NVIDIA Grace Hopper Superchip so that the GPU device could be assigned to guests using KVM/QEMU and similar for virtualization. The NVGRACE-GPU driver is now being extended for supporting the forthcoming NVIDIA Grace Blackwell "GB" designs.
Posted on Sunday were a set of patches for extending the NVGRACE-GPU VFIO driver for Grace Blackwell. This work is necessary so that the Blackwell GPU can play nicely within all the very common virtualized environments these days.
NVIDIA engineer Ankit Agrawal explained of the driver changes for accommodating Grace Blackwell VFIO support;
"NVIDIA's recently introduced Grace Blackwell (GB) Superchip in continuation with the Grace Hopper (GH) superchip that provides a cache coherent access to CPU and GPU to each other's memory with an internal proprietary chip-to-chip (C2C) cache coherent interconnect. The in-tree nvgrace-gpu driver manages the GH devices. The intention is to extend the support to the new Grace Blackwell boards.
There is a HW defect on GH to support the Multi-Instance GPU (MIG) feature [1] that necessiated the presence of a 1G carved out from the device memory and mapped uncached. The 1G region is shown as a fake BAR (comprising region 2 and 3) to workaround the issue.
The GB systems differ from GH systems in the following aspects.
1. The aforementioned HW defect is fixed on GB systems.
2. There is a usable BAR1 (region 2 and 3) on GB systems for the GPUdirect RDMA feature.
This patch series accommodate those GB changes by showing the real physical device BAR1 (region2 and 3) to the VM instead of the fake one. This takes care of both the differences."
These patches are now out for review on the [1]Linux kernel mailing list .
[1] https://lore.kernel.org/lkml/20241006102722.3991-1-ankita@nvidia.com/
Posted on Sunday were a set of patches for extending the NVGRACE-GPU VFIO driver for Grace Blackwell. This work is necessary so that the Blackwell GPU can play nicely within all the very common virtualized environments these days.
NVIDIA engineer Ankit Agrawal explained of the driver changes for accommodating Grace Blackwell VFIO support;
"NVIDIA's recently introduced Grace Blackwell (GB) Superchip in continuation with the Grace Hopper (GH) superchip that provides a cache coherent access to CPU and GPU to each other's memory with an internal proprietary chip-to-chip (C2C) cache coherent interconnect. The in-tree nvgrace-gpu driver manages the GH devices. The intention is to extend the support to the new Grace Blackwell boards.
There is a HW defect on GH to support the Multi-Instance GPU (MIG) feature [1] that necessiated the presence of a 1G carved out from the device memory and mapped uncached. The 1G region is shown as a fake BAR (comprising region 2 and 3) to workaround the issue.
The GB systems differ from GH systems in the following aspects.
1. The aforementioned HW defect is fixed on GB systems.
2. There is a usable BAR1 (region 2 and 3) on GB systems for the GPUdirect RDMA feature.
This patch series accommodate those GB changes by showing the real physical device BAR1 (region2 and 3) to the VM instead of the fake one. This takes care of both the differences."
These patches are now out for review on the [1]Linux kernel mailing list .
[1] https://lore.kernel.org/lkml/20241006102722.3991-1-ankita@nvidia.com/
phoronix