Intel To Upstream Habana Labs Network Drivers Into The Linux Kernel
([Intel] 5 Hours Ago
Habana Labs Network Drivers)
- Reference: 0001470729
- News link: https://www.phoronix.com/news/HabanaLabs-Network-Driver-Linux
- Source link:
While for years there has been the [1]Habana Labs AI accelerator driver within the mainline Linux kernel, this "accel" driver has been focused on just supporting training/inference across their products. Now being worked on for the mainline Linux kernel are upstreaming the Habana Labs network drivers that can be used for scaling out the AI workloads across multiple systems.
A set of 15 patches were posted on Thursday by Intel-owned Habana Labs for enabling the networking support on Gaudi 2 for scaling of AI neural networks for systems connected via Ethernet or InfiniBand. Engineer Omer Shpigelman explained:
This patch set implements the HabanaLabs network drivers for Gaudi2 ASIC which is designed for scaling of AI neural networks training. The patch set includes the common code which is shared by all Gaudi ASICs and the Gaudi2 ASIC specific code. Newer ASICs code will be followed. All of these network drivers are modeled as an auxiliary devices to the parent driver.
The newly added drivers are Core Network (CN), Ethernet and InfiniBand. All of these drivers are based on the existing habanalabs driver which serves as the compute driver and the entire platform. The habanalabs driver probes the network drivers which configure the relevant NIC HW of the device. In addition, it continuously communicates with the CN driver for providing some services which are not NIC specific e.g. PCI, MMU, FW communication etc.
The CN driver is both a parent and a son driver. It serves as the common layer of many shared operations that are required by both EN and IB drivers.
The Gaudi2 NIC HW is composed of 48 physical lanes, 56Gbps each. Each pair of lanes represent a 100Gbps logical port.
The NIC HW was designed specifically for scaling AI training. Hence it basically functions as a regular NIC device but it is tuned for
its dedicated purpose. As a result, the NIC HW supports Ethernet traffic and RDMA over modified ROCEv2 protocol.
While it's surprising that Intel/Habana hasn't upstreamed these network drivers yet given the time Gaudi 2 has already been available, at least it's happening now. Gaudi 3 is also on the way with even greater networking capabilities thanks to 24 x 200 GbE ports.
Those interested in these Habana Labs networking drivers now being worked on for the mainline Linux kernel can see [2]this patch series for the code now under review. In current form these new network drivers amount to 148k lines of new code.
[1] https://www.phoronix.com/search/Habana+Labs
[2] https://lore.kernel.org/lkml/20240613082208.1439968-1-oshpigelman@habana.ai/
A set of 15 patches were posted on Thursday by Intel-owned Habana Labs for enabling the networking support on Gaudi 2 for scaling of AI neural networks for systems connected via Ethernet or InfiniBand. Engineer Omer Shpigelman explained:
This patch set implements the HabanaLabs network drivers for Gaudi2 ASIC which is designed for scaling of AI neural networks training. The patch set includes the common code which is shared by all Gaudi ASICs and the Gaudi2 ASIC specific code. Newer ASICs code will be followed. All of these network drivers are modeled as an auxiliary devices to the parent driver.
The newly added drivers are Core Network (CN), Ethernet and InfiniBand. All of these drivers are based on the existing habanalabs driver which serves as the compute driver and the entire platform. The habanalabs driver probes the network drivers which configure the relevant NIC HW of the device. In addition, it continuously communicates with the CN driver for providing some services which are not NIC specific e.g. PCI, MMU, FW communication etc.
The CN driver is both a parent and a son driver. It serves as the common layer of many shared operations that are required by both EN and IB drivers.
The Gaudi2 NIC HW is composed of 48 physical lanes, 56Gbps each. Each pair of lanes represent a 100Gbps logical port.
The NIC HW was designed specifically for scaling AI training. Hence it basically functions as a regular NIC device but it is tuned for
its dedicated purpose. As a result, the NIC HW supports Ethernet traffic and RDMA over modified ROCEv2 protocol.
While it's surprising that Intel/Habana hasn't upstreamed these network drivers yet given the time Gaudi 2 has already been available, at least it's happening now. Gaudi 3 is also on the way with even greater networking capabilities thanks to 24 x 200 GbE ports.
Those interested in these Habana Labs networking drivers now being worked on for the mainline Linux kernel can see [2]this patch series for the code now under review. In current form these new network drivers amount to 148k lines of new code.
[1] https://www.phoronix.com/search/Habana+Labs
[2] https://lore.kernel.org/lkml/20240613082208.1439968-1-oshpigelman@habana.ai/
phoronix