Intel Sees a 3888.9% Performance Improvement in the Linux Kernel - From One Line of Code (phoronix.com)
- Reference: 0175437919
- News link: https://linux.slashdot.org/story/24/11/09/0654221/intel-sees-a-38889-performance-improvement-in-the-linux-kernel---from-one-line-of-code
- Source link: https://www.phoronix.com/news/Intel-Linux-3888.9-Performance
> Intel's Linux kernel test robot has reported a 3888.9% performance improvement in the mainline Linux kernel as of this past week...
>
> Intel thankfully has the resources to maintain this automated service for per-kernel commit/patch testing and has been maintaining their public kernel test robot for years now to help catch performance changes both positive and negative to the Linux kernel code. The commit in question causing this massive uplift to performance is [2]mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes . The patch message confirms it will fix some prior performance regressions and deliver some major uplift in specialized cases...
>
> That mmap patch merged last week affects just one line of code.
This week [3]the Register also reported that Linus Torvalds revised a previously-submitted security tweak that addressed Spectre and Meltdown security holes, writing in his commit message that "The kernel test robot reports a 2.6 percent improvement in the per_thread_ops benchmark."
[1] https://www.phoronix.com/news/Intel-Linux-3888.9-Performance
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d4148aeab412432bf928f311eca8a2ba52bb05df
[3] https://www.theregister.com/2024/11/06/torvalds_patch_linux_performance/
Misleading (Score:5, Informative)
How about the other part of the article:
This change has been shown to regress some workloads significantly.
One reports regressions in various spec benchmarks, with up to 600% slowdown.
Re:Misleading (Score:5, Insightful)
Also, no mention of AMD. Is anybody savvy enough to tell if this should apply to AMD as well?
Re:Misleading (Score:4, Interesting)
The misalignment caused the 600% slowdown in benchmarks, according to the commit message.
This fixes that by skipping this codepath on unaligned requests.
What's weird is Linus reverted the problem two years ago for the same reason but the committer put it back a few weeks later.
[1]https://git.kernel.org/pub/scm... [kernel.org]
To me this looks like a partial solution with room for doing it right.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ba09b1733878afe838fe35c310715fda3d46428
Re: (Score:2)
Ah, I see. I read that wrong!
Re: Misleading (Score:2)
So asking, improvements in things like treading while things like pixel mapping take take a hit? No big deal i think.
"Performance improvement" (Score:3)
Way too much dramatization. No way it's a 38x overall performance improvement, that would defy all logic and something like that would hit mainstream news. So next time you want people to swallow BS, use a much smaller number like say 3.8%.
Re: (Score:2)
That's just your ignorance talking. The 38x improvement was in a very specific case, it even is implied as such in TFS, and yes a buggy implementation of code can definitely screw up your results like this, and in some cases even worse.
Just because you're afraid of big numbers or don't understand what is going on doesn't mean it isn't actually real.
Sounds like they are trying (Score:2)
Too little, too late.
Re: (Score:2)
It's not an Intel specific change. This impacts all platforms.
Re: (Score:1)
Not really. And Intel is dead. They will just take a while to die.
Re:Sounds like they are trying (Score:4, Interesting)
Intel is not dead (cue the shovel to the head). They do need to get back to their knitting.
Remember Apple was pronounced dead once, and so was AMD.
On the other hand GE did die due to thinking they were a bank.
So, the question is should Intel concentrate on the super chips that have bragging potential but a market of 0.1% of users, or for the power efficient desktop/laptop market?
I ask because looking at the M4 I realized I can't really put my M1 Air to full load. My Linux box has about the same performance, twice the RAM, six times the storage, and five times the USB ports. Apple's desktops are not a good fit for me.
Re: (Score:2)
Intel is dead. All they ever had was superior manufacturing which came form their _memory_ business. That is over. Their CPUs always sucked in one way or another and they have not managed any innovation for a long time now. And recently, they try to pull stunts like jeopardizing CPU reliability to boost performance. That has the stink of desperation.
Read the fine print. (Score:4, Interesting)
> The patch message confirms it will fix some prior performance regressions and deliver some major uplift in specialized cases...
If you read [1]the THP documentation [kernel.org] you'll learn that "THP only works for anonymous memory mappings and tmpfs/shmem" which means unless you're using tmpfs or shared memory with a request in excess of PMD_SIZE bytes (2MiB on my system) then this has no impact.
It seems unlikely that many programs will see much difference in performance but it's always nice to see improvements added to the kernel.
[1] https://www.kernel.org/doc/html/next/admin-guide/mm/transhuge.html
That's just f'ing great... (Score:5, Funny)
Now all my crappy code will crash much faster. ;)
Re: (Score:2)
Hey, the faster it crashes, the faster a fix may come.*
*excluding Microsoft software which relies on crashes
Impacts all CPUs (Score:3)
This isn't an Intel specific change or even a x86_64 specific change, this impacts every Linux platform. The only reason "Intel Sees..." is in there is because they are the ones doing regression testing for the kernel.
Let me guess (Score:2)
DEBUG_BUILD=0;
It really does not matter how many lines of code (Score:3)
That is maybe a curiosity, but irrelevant. Also, 4000%? Most of that will _not_ mapt o general performance.
All that is to see here is pretty stupid reporting.
Re: (Score:2)
> All that is to see here is pretty stupid reporting.
I see only stupid people seeing how TFS doesn't say it's general performance improvements. In fact it literally uses the phrase "uplift in specialized cases".
Re: (Score:2)
Yeah, there's a whole lot of skeptics on Slashdot basing their skepticism on, "That's a really big number. Can't be right." I will remind them that there is a big difference between skepticism and doubt. Skepticism demands more than a surface evaluation.
Re: (Score:2)
I throw 35 years of CS experience, a CS engineering PhD and experience with CPUs on all levels and understanding of system design into the pot. That enough for you to make it "more than a surface evaluation"?
Re: (Score:2)
You cannot keep quiet when you have nothing to say, can you? I am pointing something out. I am not making a claim. Of course, the difference is lost on you.
great (Score:2)
Runs infinite loop in couple minutes now.
Impact, please. (Score:1)
Is this something that now takes microseconds instead of milliseconds for something that isn't done often, or something that takes milliseconds instead of large fractions of a second for something that's maybe done once or twice per boot on your average system?
If it were something done often enough to be noticable by Joe Average Linux User. I'd expect it to be bigger news.
Older equipment (Score:2)
The group I volunteer with got half a truckload of older systems. A speed up in the kernel would be welcome for these systems to see new life and boldly go where no cpu has gone before.
The only problem... (Score:2)
The specialized case seeing a 38x increase in performance is the HCF instruction.
But boy does it burn HOT!
Re: (Score:3)
"Intel's Linux kernel test robot has reported a 3888.9% performance improvement in the mainline Linux kernel as of this past week..."
Or, in normal language everyone can immediately understand, a 38-odd times improvement. Why the insane urge to express everything in percentages? It's almost as bad as using microfortnights instead of seconds.
Re: (Score:2)
> Why the insane urge to express everything in percentages?
To make the number bigger, obviously. People don't understand basic math, so the framing matters. For example:
"Sure, they say it's 3800% faster, but it's really just a 38x improvement."
People also don't understand how taxes work. The combination can be expensive: "I can give you a 2% raise, but it'll put you just over the line into the next tax bracket. It's up to you."
Re: (Score:2)
> People also don't understand how taxes work. The combination can be expensive: "I can give you a 2% raise, but it'll put you just over the line into the next tax bracket. It's up to you."
I'm always amazed by this because all semi-sane tax systems have accumulative tax brackets, e.g. the higher tax rate only applies to the part of the income in the new bracket.
Re: (Score:2)
More than a few conservative anti-tax politicians and news organizations like to misrepresent how a graduated income tax works in order to keep the rubes fearful. If all you do is look up your tax owed in a table or have someone else prepare it for you, you may not understand how the formula works.
Re: riiiiiight (Score:5, Interesting)
Or a really simple optimisation that was non obvious. Aligning things in memory can have an *enormous* performance impact. Finding a place where things werenâ(TM)t aligned and making sure they are now is very much the kind of thing Iâ(TM)d expect to be a massive win. I used to work at one of the major OS vendors, and this absolutely is the kind of thing weâ(TM)d on occasion find, completely legitimately.
Re: (Score:3)
And it's not just memory alignment that can generate great improvements. Not that long ago, the Linux kernel got a huge boost in network performance by reordering the elements in a large struct to significantly reduce the chances of cache misses.
Re: riiiiiight (Score:2)
Automatic struct optimization isn't a thing in C yet?
Re: (Score:2)
One reason C doesn't reorder struct fields is that it's forbiden by the standard.
In any case to make it happen optimally you'd need to know the access patterns, but C compilation units typically compiled separately, so you wouldn't know what the optimal order would be. And how do you automatically determine the best order for fields in the first place, when it's related to the access patterns of such fields? The problem becomes more intractable if the fields are part of the provided API.
Something like profi
Re: (Score:2)
Or more specifically to this optimisation, align things with respect to the data the benchmark uses. The summary doesn't go in to the 6x slower performance in other scenarios.
It's a trade off between aligned memory and fragmented memory.