News: 0001608628

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

AMD Squeezing Out More More ROCm/HIP Performance With New Device-Side PGO

([AMD] 3 Hours Ago Profile Guided Optimizations)


Compiler [1]profile guided optimization (PGO) techniques have paid off well for increasing CPU performance via application/workload-specific profiles fed back to the compiler to make more informed decisions. AMD compiler engineers have been working on crafting device-side PGO for their AMDGPU LLVM back-end for allowing ROCm/HIP workloads to achieve greater GPU performance. An initial merge request is now open for upstream LLVM.

AMD engineer Sam Liu opened the LLVM merge request for supporting offload profiling with an initial focus on a uniformity-aware optimization with the AMDGPU back-end. The focus is on HIP/AMDGPU workloads for profile-guided compiler optimizations of GPU kernels.

He explained their work at length within [2]this LLVM Discourse RFC published minutes ago in seeking feedback from the upstream LLVM developer community.

"This RFC proposes device-side Profile Guided Optimization (PGO) for HIP/AMDGPU, enabling profile-guided compiler optimizations for GPU kernels.

The key contributions are:

Device PGO infrastructure – instrumentation, profile collection, and consumption pipeline for AMDGPU device code, using only standard HIP APIs (no CLR patches required).

Uniformity-aware PGO – a safety mechanism that detects whether branches are uniform (all threads take the same path) or divergent at runtime, and gates certain optimizations accordingly.

The uniformity detection is essential because GPU execution follows the SIMT (Single Instruction, Multiple Threads) model, where standard CPU PGO assumptions about “cold” code paths do not hold. Without this safeguard, PGO-guided optimizations like spill placement can cause performance regressions on divergent branches."

The RFC thread goes on to provide an overview of the traditional challenges in applying compiler PGO techniques for GPUs rather than CPUs, different use-cases, HIPRTC for workload-adaptive optimizations, and also applying the PGO techniques to static HIP applications. A lengthy and technical read for those interested in compiler internals.

Meanwhile [3]this is the LLVM pull request for the initial code:

Key features:

- Wave-aggregated counter increments to reduce atomic contention

- Per-TU contiguous counter allocation to avoid linker reordering issues

- Uniformity detection to identify wave-uniform vs divergent branches

- Uniformity-aware spill placement to prevent PGO regressions on GPUs

The uniformity detection is critical because standard PGO can cause severe performance regressions on GPUs. When PGO moves register spills to "cold" paths, but those paths are entered divergently (different threads take different paths), partial-wave memory accesses cause poor coalescing and up to 3.7x slowdown. By detecting uniformity at profile collection time and gating spill placement decisions, we achieve:

- 12-14% speedup on uniform branches

- No regression on divergent branches (gating prevents the issue)

Promising so far and will be exciting to see how this PGO work pans out for AMD ROCm/HIP.



[1] https://www.phoronix.com/search/profile+guided+optimization

[2] https://discourse.llvm.org/t/rfc-offload-pgo-for-hip-amdgpu-device-side-profile-guided-optimization/89577

[3] https://github.com/llvm/llvm-project/pull/177665



Once Again From the Top

Correction notice in the Miami Herald: "Last Sunday, The Herald erroneously
reported that original Dolphin Johnny Holmes had been an insurance salesman
in Raleigh, North Carolina, that he had won the New York lottery in 1982 and
lost the money in a land swindle, that he had been charged with vehicular
homicide, but acquitted because his mother said she drove the car, and that
he stated that the funniest thing he ever saw was Flipper spouting water on
George Wilson. Each of these items was erroneous material published
inadvertently. He was not an insurance salesman in Raleigh, did not win the
lottery, neither he nor his mother was charged or involved in any way with
vehicular homicide, and he made no comment about Flipper or George Wilson.
The Herald regrets the errors."
-- "The Progressive", March, 1987