News: 0001635173

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Expanded Reset Support Coming For AMDGPU To Recover From More GPU Compute Hangs

([Radeon] 11 Minutes Ago Pipe Reset Support)


A set of 42 patches were posted on Thursday for the AMDGPU kernel driver and associated AMDKFD compute driver code for enabling pipe reset capabilities for compute workloads.

While there has long been queue reset support for the AMDGPU driver in helping to recover from hangs, pipe reset support is now being worked on as the next step in the recovery when simply resetting a lone queue doesn't bring the GPU back into its desired working state.

AMDGPU maintainer Alex Deucher authored many of the patches in this big patch series for the pipe reset support. Deucher elaborated on the functionality:

"There are certain corner cases where a queue reset is not able to recover a hung queue. A pipe reset can recover

some of those cases, however, when the pipe is reset all queues on that pipe are reset. This requires coordination across all components using compute queues. There is quite a bit of prep work in this series, some of which I sent out previously. Another prerequisite for this was reworking the userq reset path. It should be more straight-forward now. The final patch also needs to be updated once the new MES firmware is relased so we can check the proper firmware versions. Using older MES firmware may fail and end up in an adapter reset in some cases where the pipe reset would have worked so it should be comparable to the current behavior."

Those interested in this pipe reset support for AMD GPU compute workloads can see [1]this patch series for that latest work in enhancing the AMD GPU recovery process on Linux.



[1] https://lists.freedesktop.org/archives/amd-gfx/2026-May/145122.html



Unoptimized hard drive