MemryX MX3 M.2 Module Delivers Nice AI Performance With A Great Software Experience
([Peripherals] 61 Minutes Ago
4 Comments)
- Reference: 0001513432
- News link: https://www.phoronix.com/review/memryx-mx3-m2
- Source link:
[1]
While there are a growing number of startups offering AI accelerators, many of them are more or less vaporware and the other big challenge even among those actually shipping products is their software stacks are very premature or an outright heaping mess. Surprisingly there's a company known as MemryX that was started out of the University of Michigan AI research that is both shipping actual hardware -- and at a decent price point -- and where the software stack is a pleasant experience that works on both Windows and Linux. Here are my initial experiences in testing out the MemryX M.2 module that features four of their in-house MX3 AI accelerator chips.
[2]
The MemryX MX3 chip is an AI accelerator rated for 6 TOPS compute power but up to 16 of the chips can be interconnected for delivering up to 96 TOPS compute power in total. The MX3 supports 4 / 8 / 16 bit weights, BFloat16, and this current generation chip can handle up to 10.5 million 8-bit parameters per chip. Each MX3 chip consumes 2 Watts or less and can be connected to the host via PCIe Gen 3 or USB 3.
[3]
The MemryX M.2 module (MX3-2280-M-4) is their first end-user (developer) ready product that pairs four MX3 chips onto the M.2 2280 form factor board that can then be easily installed into any system with a PCIe Gen 3 M.2 slot. This M.2 module can then provide 24 TFLOPS of AI compute power at 6~8 Watt power use. 24 TFLOPS itself isn't impressive but again these can scale out to handle multiple modules and presumably MemryX is coming up with higher-end accelerator cards for addressing up to 16 x MX3 chip capabilities.
[4]
The M.2 module is also more impressive when considering the low power draw and that the MemryX M.2 module can be ordered today at just $149 USD. Each MX3 chip can handle only up to 10.5 million parameters (8-bit) due to a lack of DRAM so with the four chips together can handle up to 42 million parameters. MemryX did tease that they will have a PCIe card coming out next year with more MX3 chips for handling models with larger parameter counts. (The [5]MemryX Model Explorer outlines some models that have been verified and tested on their current hardware.)
[6]
Given the low power requirements, the MemryX M.2 module can be simply cooled with a passive heatsink that is included.
[7]
It's one thing actually shipping hardware but making the experience worthwhile is that the software stack isn't in a half-assed state like so many other AI vendors or challenged by poor documentation and limited Linux distribution compatibility. The current MemryX drivers are out-of-tree currently from the Linux kernel but largely open-source and MemryX offers Debian/Ubuntu packages and a generic Linux binary installer to make it very easy to setup the MX3 chips and get inferencing. I typically hesitate on accepting any review samples for new accelerator chips given the typically dodgy software state and usually less than complete documentation, but I was impressed with the thoroughness of the [8]MemryX Developer Hub and their [9]getting started guide being rather simple. There are instructions for both Windows and Linux systems. Setting up the software stack on Ubuntu 24.04 LTS and Ubuntu 24.10 were both a breeze without any software setup issues. Very little time needed and no frustrating hoops to jump through.
[10]
As for the level of open-source to the MemryX software stack, their MxAccl C++ runtime is open-source as is their run-time MxUtils pre/post plug-ins and helper toolkit, their Linux kernel driver, and the Python runtime. Over the coming year MemryX says they are working to open-source as MIT/LGPL their Linux/Windows libmemx driver library and their neural compiler. Hopefully once they get their compiler open-sourced and user-space driver library, perhaps we'll see them go for mainlining their open-source Linux kernel driver (just as it took a while for Habana Labs due to their initial lack of open-source user-space compiler and associated code) but for now it's distributed as DKMS modules for their kernel code.
[11]
[1] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_1_lrg
[2] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_2_lrg
[3] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_3_lrg
[4] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_6_lrg
[5] https://developer.memryx.com/model_explorer/models.html
[6] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_4_lrg
[7] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_7_lrg
[8] https://developer.memryx.com/
[9] https://developer.memryx.com/get_started/index.html
[10] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_5_lrg
[11] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_16_lrg
While there are a growing number of startups offering AI accelerators, many of them are more or less vaporware and the other big challenge even among those actually shipping products is their software stacks are very premature or an outright heaping mess. Surprisingly there's a company known as MemryX that was started out of the University of Michigan AI research that is both shipping actual hardware -- and at a decent price point -- and where the software stack is a pleasant experience that works on both Windows and Linux. Here are my initial experiences in testing out the MemryX M.2 module that features four of their in-house MX3 AI accelerator chips.
[2]
The MemryX MX3 chip is an AI accelerator rated for 6 TOPS compute power but up to 16 of the chips can be interconnected for delivering up to 96 TOPS compute power in total. The MX3 supports 4 / 8 / 16 bit weights, BFloat16, and this current generation chip can handle up to 10.5 million 8-bit parameters per chip. Each MX3 chip consumes 2 Watts or less and can be connected to the host via PCIe Gen 3 or USB 3.
[3]
The MemryX M.2 module (MX3-2280-M-4) is their first end-user (developer) ready product that pairs four MX3 chips onto the M.2 2280 form factor board that can then be easily installed into any system with a PCIe Gen 3 M.2 slot. This M.2 module can then provide 24 TFLOPS of AI compute power at 6~8 Watt power use. 24 TFLOPS itself isn't impressive but again these can scale out to handle multiple modules and presumably MemryX is coming up with higher-end accelerator cards for addressing up to 16 x MX3 chip capabilities.
[4]
The M.2 module is also more impressive when considering the low power draw and that the MemryX M.2 module can be ordered today at just $149 USD. Each MX3 chip can handle only up to 10.5 million parameters (8-bit) due to a lack of DRAM so with the four chips together can handle up to 42 million parameters. MemryX did tease that they will have a PCIe card coming out next year with more MX3 chips for handling models with larger parameter counts. (The [5]MemryX Model Explorer outlines some models that have been verified and tested on their current hardware.)
[6]
Given the low power requirements, the MemryX M.2 module can be simply cooled with a passive heatsink that is included.
[7]
It's one thing actually shipping hardware but making the experience worthwhile is that the software stack isn't in a half-assed state like so many other AI vendors or challenged by poor documentation and limited Linux distribution compatibility. The current MemryX drivers are out-of-tree currently from the Linux kernel but largely open-source and MemryX offers Debian/Ubuntu packages and a generic Linux binary installer to make it very easy to setup the MX3 chips and get inferencing. I typically hesitate on accepting any review samples for new accelerator chips given the typically dodgy software state and usually less than complete documentation, but I was impressed with the thoroughness of the [8]MemryX Developer Hub and their [9]getting started guide being rather simple. There are instructions for both Windows and Linux systems. Setting up the software stack on Ubuntu 24.04 LTS and Ubuntu 24.10 were both a breeze without any software setup issues. Very little time needed and no frustrating hoops to jump through.
[10]
As for the level of open-source to the MemryX software stack, their MxAccl C++ runtime is open-source as is their run-time MxUtils pre/post plug-ins and helper toolkit, their Linux kernel driver, and the Python runtime. Over the coming year MemryX says they are working to open-source as MIT/LGPL their Linux/Windows libmemx driver library and their neural compiler. Hopefully once they get their compiler open-sourced and user-space driver library, perhaps we'll see them go for mainlining their open-source Linux kernel driver (just as it took a while for Habana Labs due to their initial lack of open-source user-space compiler and associated code) but for now it's distributed as DKMS modules for their kernel code.
[11]
[1] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_1_lrg
[2] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_2_lrg
[3] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_3_lrg
[4] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_6_lrg
[5] https://developer.memryx.com/model_explorer/models.html
[6] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_4_lrg
[7] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_7_lrg
[8] https://developer.memryx.com/
[9] https://developer.memryx.com/get_started/index.html
[10] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_5_lrg
[11] https://www.phoronix.com/image-viewer.php?id=memryx-mx3-m2&image=memryx_mx3_16_lrg