Patches For AMD GPUs On Loongson Point To "Massive Platform Bug" For These Chinese CPUs
([Radeon] 6 Hours Ago
AMDGPU Patches)
- Reference: 0001471604
- News link: https://www.phoronix.com/news/AMDGPU-Patches-Loongson-Bug
- Source link:
A set of patches were posted on Monday in aiming to get aging AMD Radeon GFX7/GFX8 era graphics processors working on Loongson LoongArch platforms. These patches for handling old Radeon Hawaii~Polaris GPUs on Loongson point to a "massive platform bug" with these domestic Chinese systems.
The [1]patch series for the AMDGPU and Radeon kernel drivers aim to address GPU crashes seen with the older AMD Radeon graphics cards when running on Loongson systems.
The graphics driver changes were modifying a workaround for a cache flushing problem in turn confusing some hardware platforms. The patches were immediately rejected as they disable behavior needed by the driver for other platforms.
The ensuing back and forth developer conversation on the [2]mailing list led longtime AMD Linux engineer Christian König to sum things up as:
"Well then you have a massive platform bug.
Two consecutive writes to the same bus address are perfectly legal from the PCIe specification and can happen all the time, even without this specific hw workaround."
And further adding:
"Well to be honest on a platform where even two consecutive writes to the same location doesn't work I would have strong doubts that it is stable in general."
Further pointing to the fragile state of Loongson hardware are also other talked about [3]workarounds like dropping the PCIe link speed from x16 to x8, tweaking the power management, or even upgrading the heatsink of the chipset.
This Loongson platform issue beyond GPUs could also point to potential other problems with network and storage I/O too.
[1] https://lore.kernel.org/dri-devel/20240617105846.1516006-1-uwu@icenowy.me/
[2] https://lore.kernel.org/dri-devel/d44651a7-0c07-4b84-8828-f1d405359aeb@amd.com/
[3] https://lore.kernel.org/dri-devel/e27a5acebe5c7d1e09edbc9dc49f52b672d72988.camel@xry111.site/
The [1]patch series for the AMDGPU and Radeon kernel drivers aim to address GPU crashes seen with the older AMD Radeon graphics cards when running on Loongson systems.
The graphics driver changes were modifying a workaround for a cache flushing problem in turn confusing some hardware platforms. The patches were immediately rejected as they disable behavior needed by the driver for other platforms.
The ensuing back and forth developer conversation on the [2]mailing list led longtime AMD Linux engineer Christian König to sum things up as:
"Well then you have a massive platform bug.
Two consecutive writes to the same bus address are perfectly legal from the PCIe specification and can happen all the time, even without this specific hw workaround."
And further adding:
"Well to be honest on a platform where even two consecutive writes to the same location doesn't work I would have strong doubts that it is stable in general."
Further pointing to the fragile state of Loongson hardware are also other talked about [3]workarounds like dropping the PCIe link speed from x16 to x8, tweaking the power management, or even upgrading the heatsink of the chipset.
This Loongson platform issue beyond GPUs could also point to potential other problems with network and storage I/O too.
[1] https://lore.kernel.org/dri-devel/20240617105846.1516006-1-uwu@icenowy.me/
[2] https://lore.kernel.org/dri-devel/d44651a7-0c07-4b84-8828-f1d405359aeb@amd.com/
[3] https://lore.kernel.org/dri-devel/e27a5acebe5c7d1e09edbc9dc49f52b672d72988.camel@xry111.site/
Estranged1906