Linus Torvalds Defends Windows' Blue Screen of Death (itsfoss.com)
- Reference: 0180303357
- News link: https://linux.slashdot.org/story/25/12/06/0155255/linus-torvalds-defends-windows-blue-screen-of-death
- Source link: https://itsfoss.com/news/torvalds-blue-screen-of-death/
> In that video, Sebastian discussed Torvalds' fondness for ECC ( [2]Error Correction Code ). I am using their last name because Linus will be confused with Linus. This is where Torvalds says this: "I am convinced that all the jokes about how unstable Windows is and blue screening, I guess it's not a blue screen anymore, a big percentage of those were not actually software bugs. A big percentage of those are hardware being not reliable."
>
> Torvalds further mentioned that gamers who overclock get extra unreliability. Essentially, Torvalds believes that having ECC on the machine makes them more reliable, makes you trust your machine. Without ECC, the memory will go bad, sooner or later. He thinks that more than software bugs, often it is hardware behind Microsoft's blue screen of death.
You can watch the video [3]on YouTube (the BSOD comments occur at ~9:37).
[1] https://itsfoss.com/news/torvalds-blue-screen-of-death/
[2] https://en.wikipedia.org/wiki/Error_correction_code?ref=itsfoss.com
[3] https://www.youtube.com/watch?v=mfv0V1SxbNA&t=577s
Bless his heart...but... (Score:1)
It's been a while since I managed on-prem windows server instances - like 15 years ago - but I certainly remember seeing some BSODs on machines that had ECC memory!
Re: (Score:2)
He didn't say "all": he said a "big percentage."
Linus is right, but this is really not news (Score:2)
That the infamous Windows BSoD, at least since the WinNT era started, are almost always caused by dodgy hardware, is common knowledge to anybody who has spent the least amount of time as a support tech on Windows machines. It's true that they could be better at communicating this.
I've never used ECC in my personal machines - I'm sure it's great - but since the early 00's or so, BSoDs are just not a thing that regular users experience unless they have bottom-tier or broken hardware, and people that buy lo
Re: (Score:3)
Most people who have (not so) fond memories of the BSoD predate that era and experienced it on a daily basis. The problem was drastically reduced going from Windows 95 to 98 to 2000/XP, to the extent that it's impossible for hardware to be the primary culprit. Windows dominated the landscape, but they weren't the only OS around and nothing else was that unstable despite using the hardware of that era. Before NT, Windows was an absolute mess. I think the only reason most people put up with it was that they d
Re: (Score:2)
Win9x and Win2k (and the other NT descendants) are fundamentally different operating systems. In general, NT had a much more robust kernel, so system panics were and remain mainly hardware issues, or, particularly in the old days, dodgy drivers (which is just another form of hardware issue). I've seen plenty of panics on *nix systems and Windows systems, and I'd say probably 90-95% were all hardware failures, mainly RAM, but on a few occasions something wrong with the CPU itself or with other critical hardw
Re: Linus is right, but this is really not news (Score:2)
Adding to that I was primarily a Linux user even back then, but would occasionally dual boot into Windows. So same hardware, BSOD galore or at least frequent enough to be very annoying, but Linux as stable as could be and not a kernel panic in sight.
So clearly not all down to unreliable hardware.
Re: (Score:2)
> Before NT, Windows was an absolute mess. I think the only reason most people put up with it was that they didn't know anything better was possible and since Windows was so widespread it was a misery everyone shared.
I think that many of those people were also recent DOS users. Given that DOS systems would often simply freeze up several times per day and require a reboot (easy to do since any bug in the user's application could do this), once they added a protected mode pseudo-kernel to Windows (maybe starting with Windows/386 2.1), it was actually a slight improvement over what they were used to since DOS crashes could sometimes be isolated to one virtual terminal.
Re: (Score:2)
With win10 in recent months i noticed that the Nvidia video driver hiccups and throws a BSOD practically after each time the OS gets an update. The fix is to completely remove the driver and related files using a tool like DDU and then to reinstall the driver. The BSODs stop... until the next OS update. Which is why i keep the PC disconnected from the network unless completely, unavoidably necessary.
No BSOD but Linux PANIC (Score:4, Interesting)
Back in 1995, on a brand new PC configured dual boot, no overclocking, I would get PANIC messages from the Linux kernel but Windows worked fine, no BSOD. Out of despair I reached out to Linus Torwalds and he very kindly helped me out, suggesting I had defective memory chips. He was correct. There were 2 defective memory chips. I guess Windows was just too slow to reveal the defective chips.
Re: (Score:1)
I meant Torvalds. Apologies.
Re:No BSOD but Linux PANIC (Score:4, Interesting)
Linux has utilized - pretty much forever - all the available memory as cache/buffer, so you were bound to run into the problem much sooner.
The Win95/98/ME could run for long time without ever accessing particular physical memory chips.
Windows NT didn't have this problem, but on the other hand WinNT and successors also had better isolation so if a driver crashed due to memory issue, it recovered better (This applies really to WinNT 3.5 and perhaps 4, back when it was still going with the Dave Cutler's VMS-derived approach - WinNT 3.5 is almost a microkernel).
Re: (Score:2)
It could also be that Windows didn't allocate memory in the same pattern as Linux, and by chance didn't happen to need to use the defective chips. I've certainly seen Windows BSODs due to bad memory, so I know it's capable of detecting such problems.
Re: (Score:2)
It's nothing to do with speed. Windows / Linux has no ability to read or write faster or slower to RAM, the speed is set at boot. It's about accessing a broken area with data critical enough to cause a system error.
Re: (Score:2)
How did you confirm that this was the cause of the problem? By replacing the memory chips with known good ones?
That, and cooling (Score:2)
Bad cooling was a huge problem for PCs as well. There was not only the cheap fans that quickly wore out, but also people who kept the computer in a drawer, or on the floor where it acted as a stationary roomba vacuuming up all the dust within reach.
did anyone answer that question? (Score:2)
Why is it that the vast majority of PC builders don't care about ECC memory? Why didn't Microsoft push for this instead of TPM 2?
Cheap (Score:3)
Because ECC adds price and, usually, is slower than regular memory. What has mainly driven PC hardware is gaming, and gamers care about speed, not long-term stability.
RAM speed doesn't matter as much as it used to for framerates, though, unless you are overclocking a ton, in which case you don't care about stability anyways.
What's impressive to ... (Score:2)
I've described the boot process of a modern PC to people in the past, to make this point.
From power on to firmware and POST. Then the lights are turned on for all the areas of hardware responsibility. Some happen right away, others further down the line. Boot managers, OS, drivers... logins, more drivers, then - finally - the system subsides into an orderly management of resources using an incredible juggling act of interrupt management and carefully segregated multitasking where everybody must be orderly.
A
BSoD was an indicator (Score:5, Insightful)
BSoD was telling you what was going on, but they made it difficult to understand what to do. When you are the main OS normal people use, then you need to make it clear what is going on.
Error logs and crash reports could tell you a lot if you knew how to get to them. But since MS didnt make it easy or help the end user, it turned into its an MS problem and MS sucks.
Every time I got a BSoD, and debugged it, it was always a driver or hardware
This has nothing to do with todays BS Windows crapware on their systems, just the general history of BSoD.
With that I agree, BSoD was blamed by default, because it was telling you the problem in a terrible way.
Re:BSoD was an indicator (Score:4, Interesting)
I did the following things that reduced BSOD's massively.
1/ UPS with brownout protection.
2/ Put the swap file on its own partition.
3/ Move all applications and data to their own partition leaving the C: drive with only Windows itself on it.
My experience was that most BSOD's were 3 categories, power fluctuations, programs (including windows itself) interfering with the swap file and programs interfering with Windows on the hard drive. Also after that defrags were much less needed as the only partition that badly fragmented was the C: drive with windows itself fragmenting itself.
Re: (Score:2)
None of that makes any sense unless you have a drive that is woefully unreliable and starts corrupting shit in flight.
There's no "interfering". You either write to the windows directory or you don't. You either write to the swap file or you don't (actually no you just don't full stop, software has no basis for nor ability to interfere with the swapfile without operating under elevated privileges). It doesn't matter where windows files are, they are always in the same place: %windir%, and software doesn't ma
Re: (Score:2)
Yeah, most BSODs I've seen were from shitty drivers of cheap hardware. Sometimes shitty drivers of expensive hardware. Sometimes shitty software with too many privileges.
Hard to think of something else, but then I've been away from windoze for over 2 decades now.
Re: (Score:3)
I don't think it's fair to blame Microsoft for cryptic BSOD logs. When a memory chip goes bad, the OS doesn't have any way to know whether the chip is bad, or whether some driver bug caused the memory checksums to fail. If a component overheats and starts spitting out garbage, how is the BSOD supposed to diagnose that? If hardware is installed that isn't quite compatible, is the BSOD supposed to be able to display a nice, human-friendly message telling you that the model number of your component is a mismat
Re: (Score:2)
> BSoD was telling you what was going on, but they made it difficult to understand what to do.
The BSoD only ever gave you enough information to tell you what driver crashed. Or a simple error code. It still does. That hasn't changed.
> Error logs and crash reports could tell you a lot if you knew how to get to them. But since MS didnt make it easy or help the end user, it turned into its an MS problem and MS sucks.
Be careful what you wish for. Error logs and the tools are great and all, but if a user is unable to go read on the MSDN Docs how to debug something they will not have a hope in hell of understanding the debug output either. Kernel panics are no better in this regard either. The average user (heck the average poweruser) has no hope in hell of understanding what went wron