GCC 16 Lands Improved Memmove Behavior For x86/x86_64 CPUs
([GNU] 6 Hours Ago
inline memmove)
- Reference: 0001588834
- News link: https://www.phoronix.com/news/GCC-16-x86-Inline-Memmove
- Source link:
H.J. Lu, a long-time compiler expert at Intel, merged today improved memmove() behavior for the GNU Compiler Collection ahead of the upcoming GCC 16 release.
The change for GCC x86/x86_64 is for inlining memmove with overlapping unaligned loads and stores. H.J. Lu argued his rationale with [1]the patch for inlining memmove functionality more:
"x86-64: Inline memmove with overlapping unaligned loads and stores
Inline memmove in 64-bit since there are much less registers available in 32-bit:
1. Load all sources into registers and store them together to avoid possible address overlap between source and destination.
2. For known size, first try to fully unroll with 8 registers.
3. For size <= 2 * MOVE_MAX, load all sources into 2 registers first and then store them together.
4. For size > 2 * MOVE_MAX and size <= 4 * MOVE_MAX, load all sources into 4 registers first and then store them together.
5. For size > 4 * MOVE_MAX and size <= 8 * MOVE_MAX, load all sources into 8 registers first and then store them together.
6. For size > 8 * MOVE_MAX,
a. If address of destination > address of source, copy backward with a 4 * MOVE_MAX loop with unaligned loads and stores. Load the first 4 * MOVE_MAX into 4 registers before the loop and store them after the loop to support overlapping addresses.
b. Otherwise, copy forward with a 4 * MOVE_MAX loop with unaligned loads and stores. Load the last 4 * MOVE_MAX into 4 registers before the loop and store them after the loop to support overlapping addresses.
Verified and benchmarked memmove implementations inlined with GPR, SSE2, AVX2 and AVX512 using glibc memmove tests.
...
Their performances are comparable with optimized memmove implementations in glibc on Intel Core i7-1195G7."
The code was merged this morning ahead of [2]GCC 16's stage 3 milestone this month . GCC 16.1 as the first stable release of [3]GCC 16 should be out around March~April.
[1] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b41f96465190751561f6909e858604ceab00595b
[2] https://www.phoronix.com/news/GCC-16-Stage-3-Next-Month
[3] https://www.phoronix.com/search/GCC+16
The change for GCC x86/x86_64 is for inlining memmove with overlapping unaligned loads and stores. H.J. Lu argued his rationale with [1]the patch for inlining memmove functionality more:
"x86-64: Inline memmove with overlapping unaligned loads and stores
Inline memmove in 64-bit since there are much less registers available in 32-bit:
1. Load all sources into registers and store them together to avoid possible address overlap between source and destination.
2. For known size, first try to fully unroll with 8 registers.
3. For size <= 2 * MOVE_MAX, load all sources into 2 registers first and then store them together.
4. For size > 2 * MOVE_MAX and size <= 4 * MOVE_MAX, load all sources into 4 registers first and then store them together.
5. For size > 4 * MOVE_MAX and size <= 8 * MOVE_MAX, load all sources into 8 registers first and then store them together.
6. For size > 8 * MOVE_MAX,
a. If address of destination > address of source, copy backward with a 4 * MOVE_MAX loop with unaligned loads and stores. Load the first 4 * MOVE_MAX into 4 registers before the loop and store them after the loop to support overlapping addresses.
b. Otherwise, copy forward with a 4 * MOVE_MAX loop with unaligned loads and stores. Load the last 4 * MOVE_MAX into 4 registers before the loop and store them after the loop to support overlapping addresses.
Verified and benchmarked memmove implementations inlined with GPR, SSE2, AVX2 and AVX512 using glibc memmove tests.
...
Their performances are comparable with optimized memmove implementations in glibc on Intel Core i7-1195G7."
The code was merged this morning ahead of [2]GCC 16's stage 3 milestone this month . GCC 16.1 as the first stable release of [3]GCC 16 should be out around March~April.
[1] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b41f96465190751561f6909e858604ceab00595b
[2] https://www.phoronix.com/news/GCC-16-Stage-3-Next-Month
[3] https://www.phoronix.com/search/GCC+16