Updated GCC Patches For OpenMP Unified Shared Memory On AMD & NVIDIA GPUs
([GNU] 3 Hours Ago
OpenMP Unified Shared Memory)
- Reference: 0001475407
- News link: https://www.phoronix.com/news/GCC-OpenMP-USM-AMD-NV-2024
- Source link:
Two years after originally posting patches for working on Unified Shared Memory (USM) support for OpenMP with the GNU Compiler Collection (GCC), there is finally an updated patch-set for implementing this shared memory functionality with both AMD and NVIDIA GPUs.
Andrew Stubbs with BayLibre posted the latest patches for implementing OpenMP Unified Shared Memory functionality within GCC on AMD and NVIDIA GPUs. Stubbs explained in the patch cover letter:
"The series implements OpenMP's "Unified Shared Memory" concept, first for NVidia GPUs, and then for AMD GPUs. We already have a very simple implementation of USM that works on integrated APU devices and any other device that supports shared memory access natively. This new implementation replaces that implementation in the case where using "managed memory" is likely to be a win (the usual non-APU case).
In theory, explicit mapping of exactly the right memory with carefully hand-optimized "to" and "from" directives is the most optimal implementation (except possibly in the case where the data is too large for the device). Experimentally, the "dumb" USM implementation we already have performs quite well with modern devices and drivers. This new managed memory implementation appears to fall between the two, and can outperform explicit mapping in the non-trivial cases (e.g. many small mappings, sparse data, rectangular copies, etc.)
The trade-off for the additional performance is added complexity and malloc/free is no longer compatible with external libraries (e.g. strdup)."
The implementation includes two new OpenMP GNU extensions (ompx_gnu_unified_shared_mem_alloc / ompx_gnu_unified_shared_mem_space and ompx_gnu_host_mem_alloc / ompx_gnu_host_mem_space).
The OpenMP Unified Shared Memory support with these patches can be enabled at compile-time using -foffload-memory=unified for the NVIDIA NVPTX and AMDGCN targets.
Andrew Stubbs with BayLibre posted the latest patches for implementing OpenMP Unified Shared Memory functionality within GCC on AMD and NVIDIA GPUs. Stubbs explained in the patch cover letter:
"The series implements OpenMP's "Unified Shared Memory" concept, first for NVidia GPUs, and then for AMD GPUs. We already have a very simple implementation of USM that works on integrated APU devices and any other device that supports shared memory access natively. This new implementation replaces that implementation in the case where using "managed memory" is likely to be a win (the usual non-APU case).
In theory, explicit mapping of exactly the right memory with carefully hand-optimized "to" and "from" directives is the most optimal implementation (except possibly in the case where the data is too large for the device). Experimentally, the "dumb" USM implementation we already have performs quite well with modern devices and drivers. This new managed memory implementation appears to fall between the two, and can outperform explicit mapping in the non-trivial cases (e.g. many small mappings, sparse data, rectangular copies, etc.)
The trade-off for the additional performance is added complexity and malloc/free is no longer compatible with external libraries (e.g. strdup)."
The implementation includes two new OpenMP GNU extensions (ompx_gnu_unified_shared_mem_alloc / ompx_gnu_unified_shared_mem_space and ompx_gnu_host_mem_alloc / ompx_gnu_host_mem_space).
The OpenMP Unified Shared Memory support with these patches can be enabled at compile-time using -foffload-memory=unified for the NVIDIA NVPTX and AMDGCN targets.
phoronix