LLVM Clang Switches MMX Intrinsics To Use SSE2 Instead
([LLVM] 4 Hours Ago
SSE2 In Place Of MMX Usage)
- Reference: 0001481064
- News link: https://www.phoronix.com/news/LLVM-Clang-MMX-Intrinsics-SSE2
- Source link:
Following LLVM/Clang recently [1]dropping support for AMD 3DNow! instructions , the open-source compiler stack is now pushing the MMX SIMD instruction set to a backseat. Moving forward the MMX intrinsics will not make use of MMX but rather be mapped to SSE2. This is all fine unless you are wanting to use this modern code compiler on an Intel Pentium MMX / Pentium II / Pentium III or AMD K6 / K7 processor from the late 90's.
Those on one of the few Intel and AMD processor series that supports MMX but not SSE2, beginning with LLVM/Clang 20 you will lose support for generating vectorized code from the __m64 intrinsics. For those using a processor of the last two decades of basically Pentium 4 and newer, the intrinsics will map fine to SSE2 and thus not cause any troubles.
[2]
This is part of the LLVM effort to remove the use of MMX registers and in turn enhance the compiler code. The [3]pull request by James Knight explains:
"The MMX instruction set is legacy, and the SSE2 variants are in every way superior, when they are available -- and they have been available since the Pentium 4 was released, 20 years ago.
Therefore, we are switching the "MMX" intrinsics to depend on SSE2, unconditionally. This change entirely drops the ability to generate vectorized code using compiler intrinsics for chips with MMX but without SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released 1997-1999), as well as AMD K6 and K7 series chips of around the same timeframe. Targeting these older CPUs remains supported -- simply without the ability to use MMX compiler intrinsics.
Migrating away from the use of MMX registers also fixes a rather non-obvious requirement. The long-standing programming model for these MMX intrinsics requires that the programmer be aware of the x87/MMX mode-switching semantics, and manually call _mm_empty() between using any MMX instruction and any x87 FPU instruction. If you neglect to, then every future x87 operation will return a NaN result. This requirement is not at all obvious to users of these these intrinsic functions, and causes very difficult to detect bugs.
Worse, even if the user did write code that correctly calls _mm_empty() in the right places, LLVM may sometimes reorder x87 and mmx operations around each-other, unaware of this mode switching issue."
This code is [4]merged and now in the LLVM Git codebase for the LLVM/Clang 20 compiler due out in early 2025.
[1] https://www.phoronix.com/news/LLVM-Ends-AMD-3DNow
[2] https://www.phoronix.com/image-viewer.php?id=2024&image=intel_socket_478_lrg
[3] https://github.com/llvm/llvm-project/pull/96540
[4] https://github.com/llvm/llvm-project/commit/0431d6dab40b05d9f4a312a9c170c81a889bfb49
Those on one of the few Intel and AMD processor series that supports MMX but not SSE2, beginning with LLVM/Clang 20 you will lose support for generating vectorized code from the __m64 intrinsics. For those using a processor of the last two decades of basically Pentium 4 and newer, the intrinsics will map fine to SSE2 and thus not cause any troubles.
[2]
This is part of the LLVM effort to remove the use of MMX registers and in turn enhance the compiler code. The [3]pull request by James Knight explains:
"The MMX instruction set is legacy, and the SSE2 variants are in every way superior, when they are available -- and they have been available since the Pentium 4 was released, 20 years ago.
Therefore, we are switching the "MMX" intrinsics to depend on SSE2, unconditionally. This change entirely drops the ability to generate vectorized code using compiler intrinsics for chips with MMX but without SSE2: the Intel Pentium MMX, Pentium, II, and Pentium III (released 1997-1999), as well as AMD K6 and K7 series chips of around the same timeframe. Targeting these older CPUs remains supported -- simply without the ability to use MMX compiler intrinsics.
Migrating away from the use of MMX registers also fixes a rather non-obvious requirement. The long-standing programming model for these MMX intrinsics requires that the programmer be aware of the x87/MMX mode-switching semantics, and manually call _mm_empty() between using any MMX instruction and any x87 FPU instruction. If you neglect to, then every future x87 operation will return a NaN result. This requirement is not at all obvious to users of these these intrinsic functions, and causes very difficult to detect bugs.
Worse, even if the user did write code that correctly calls _mm_empty() in the right places, LLVM may sometimes reorder x87 and mmx operations around each-other, unaware of this mode switching issue."
This code is [4]merged and now in the LLVM Git codebase for the LLVM/Clang 20 compiler due out in early 2025.
[1] https://www.phoronix.com/news/LLVM-Ends-AMD-3DNow
[2] https://www.phoronix.com/image-viewer.php?id=2024&image=intel_socket_478_lrg
[3] https://github.com/llvm/llvm-project/pull/96540
[4] https://github.com/llvm/llvm-project/commit/0431d6dab40b05d9f4a312a9c170c81a889bfb49
coder