Techie went home rather than fix mistake that caused a massive meltdown
- Reference: 1750663574
- News link: https://www.theregister.co.uk/2025/06/23/who_me/
- Source link:
This week, meet a reader we'll Regomize as "Stuart" who hoped his career would see him pursue pure research but found himself working as IT manager for what he described as "a very young startup spun out of a local research center to commercialize a novel gene analysis technique."
The IT team was small – Stuart plus a couple of contractors – but that meant our hero had "very nearly free rein in designing the systems, so the job was anything but boring."
[1]
He duly made security keys mandatory for authentication, implemented "paranoia-grade network partitioning" – native IPv6-only, naturally – and created an encrypted, append-only off-site data archive.
[2]
[3]
"You name it, if I could show it made sense I got to implement it," Stuart told Who, Me?
Stuart spotted one project that made a lot of sense. The startup kept a lot of samples in freezers and was contractually obliged to keep them safe for years. Creating a system to monitor that the freezers were in working order was an obvious win.
[4]
Lab-grade freezers are built to make this sort of project doable because they include sensors galore, employ the Modbus serial communication protocol, and can even pack an RS485 serial port to help the machines share sensor info with the world.
Stuart therefore pulled some extra wiring through the cabling ducts, configured a Raspberry Pi 1B to ingest data, and wrote Python scripts to feed info into the startup's monitoring infrastructure.
The resulting system was very simple. If a freezer door was left open too long, alerts went off.
[5]
"It proved hugely successful and popular among the lab techies because it was all too easy to leave those doors ajar, so we set out to extend the scope," Stuart wrote.
"It was only then that I realized that the specs provided by the freezer manufacturer were not in total agreement with reality on the freezers we had," he added.
Stuart tried to make sense of it all. One Friday, he accidentally tweaked a setting so the freezers monitored temperatures measured in Fahrenheit instead of Celsius – but didn't change alarm thresholds accordingly.
That has obvious potential for mayhem given that 32° Fahrenheit is 0° Celsius.
Within seconds of unwittingly making this mistake, alerts flashed across Stuart's freezer-monitoring dashboards.
He had no idea of the cause, but as the freezer doors were closed, he didn't think it was important enough to warrant an immediate fix.
[6]Techie exposed giant tax grab, maybe made government change the rules
[7]Field support chap got married – which took down a mainframe
[8]Admin brought his drill to work, destroyed disks and crashed a datacenter
[9]Techies thought outside the box. Then the boss decided to take the box away
Then another emergency erupted and consumed the rest of the Friday. Stuart figured he could fix the freezer monitor on Monday, so he turned off his alarms and went home for the weekend.
The next day, an electrical contractor made a worse mistake.
"He managed to somehow trip every single circuit breaker in the building and left without saying a word to anyone," Stuart told Who, Me?
The problem went unnoticed until Monday morning, when Stuart arrived at work and was relieved to find nobody blamed him for the mess.
"Our lease specified redundant power for the freezers and required that weekend electrical work be announced well in advance," he told Who, Me?
Recriminations therefore focused on the landlord, not Stuart's tech.
This story has a happy ending because, despite the meltdown, the samples in the freezers survived intact.
"Even so, my very first thought after coming to work on that day and learning what had happened was 'Oh, BUGGER,'" Stuart told Who, Me?
Has someone else's mistake saved you from being blamed for your own errors? If so, [10]click here to send email to Who, Me? We always appreciate fresh stories to feed you each Monday. ®
Get our [11]Tech Resources
[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aFkltbP5ui9jtSu596IQbAAAAQY&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aFkltbP5ui9jtSu596IQbAAAAQY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aFkltbP5ui9jtSu596IQbAAAAQY&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aFkltbP5ui9jtSu596IQbAAAAQY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aFkltbP5ui9jtSu596IQbAAAAQY&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[6] https://www.theregister.com/2025/06/16/who_me/
[7] https://www.theregister.com/2025/06/09/who_me/
[8] https://www.theregister.com/2025/06/02/who_me/
[9] https://www.theregister.com/2025/05/26/who_me/
[10] mailto:whome@theregister.com
[11] https://whitepapers.theregister.com/
My companies IT team in the US are planning a major infrastructure change on 3rd July!
Anon, obviously - and I may see you post-apocalypse this may cause - or at least after lots of overtime for the IT team over their holiday and lots of downtime for the rest of the world!
We need some common sense in the world of manglement who plan these things.
Someone else's mistake?
Well, no......... but it sure looked like it, and I kept my mouth shut.
I was working at the computer shop, and it was the afternoon, I was about scheduled to go home. I was doing something on the front computer -- this being a time when the computer shop had a front computer that people could use, browse the web with, try out the local shop's computer. Windows 95 era.
Whatever I was doing, I ended up needing to delete some files, and so did a `deltree` command, and.... watched it delete the whole Windows System directory.
:-O oh no oh no ohnoohno....
I was the software techie, amongst the hardware techie, and the kinda-does-sorta-both techie. Anyway, after deleting the data (that was supposed to be deleted - part of the task) and finishing up my task, I went home. (Thinking to myself, run before it *BREAKS*!)
When I got in the next day, the other techie mentioned that Windows crapped out for him shortly after I'd left, and it was just giving _really weird_ problems, and he couldn't get it to reboot. I said I'd look at it, and knowing to myself roughly what the problem was, set about fixing the computer: go to another Windows machine, grab a copy of windows\system, and copy everything to the computer that's acting up.
"It was missing a lot of System files. Weird."
Honestly, this mostly got it working. There were a few oddities forever more with it - but it largely worked. The techie who originally encountered the error was aghost (not quite aghast!) - he said he didn't think I'd be able to bring it back. Fdisk-format-re-install, doo dah, doo dah.. but no. I like to fix machines, not reinstall them. :-)
No one ever mentioned fault, it seemed to be just one of those odd Windows things that just happens sometimes. I spent all day, most every day, fixing them - that was my job there. This one too, fix it up, and on with the next task.
Was it someone else's mistake? Well, no, but... maybe it certainly looked like it at the time to the innocent bystanders. ;-) These days I make far fewer mistakes that actually have impact, and when they do it's usually much more obvious when I've caused the fault. Even when not, I'll raise it to the group, get it fixed up quick, take the boss' feadback/lecture (fess up and it's quicker/easier), do the post-mortem, and put it in the back of my memory. Not so bad. I feel like the company would be dumb to get rid of someone if they just paid a bundle to educate them. :-) (As long as it's not repetitive.)
* Note that this was Windows: any files that were actually *in use* at the time could not be deleted - which was a lot, and enough to keep the *really important* files intact, and the programs that were running - kept running. Mostly. So I needed to get those "extras" that ran sometimes, like at system or application startup, maybe. After the delete, the system didn't crash (immediately), allowing for it to break for the next person actually trying to _do_ something with it, and allowing me to feign ignorance. Phew.
A memory issue
One job I worked on was for a process controller that had to run continuously for months - only it kept crashing after a week or two. Basically we'd been failing to free memory in a small I/O buffer after discarding it so it was slowly eating up memory. By sheer chance I noticed the computer it was running on had half the memory we'd specified, so we said that was the cause, but that as a courtesy we'd transfer the code to a new machine for them if they got one with the specified amount. A quick tweak of the code and it was all smiles.
Re: A memory issue
A job that I was at was having to upgrade our embedded PC because the current model was approaching end-of-life.
We got an upgraded model, and I (as the IT guy? but the developers set the requirements..) was going through things with one of the developers. The replacement was spec'd to match the current embedded PC, for which I knew the specs and operational parameters (but not all of the wiring). We went through, checked the (vendor pre-configured) BIOS settings, passwords, settings, hardware. It was a preview unit, so I didn't expect everything to be perfect.
I verbally commented, "Huh, it's only got half the RAM." The developer testing with me didn't flinch or make a comment, and I just noted it on the test form - preview unit has half RAM of shipping units. BIOS configured correctly.
This little note started a shit-storm. The part about them sending us a preview based on the "current specifications"? They did. The current specifications were ... incorrect. As a twist, they were correct per what the vendor had, but incorrect per what my company had specified to the vendor. Either we didn't pre-production check the systems in the first place (the requirements were *much* less strict than the upgraded units), or it was simply missed by the developers. In fact, they had been shipping us half-RAM units for quite a long time, and we had *thousands* of such machines in the field.
At the same time, we were working through software optimization to get memory usage down in our software. A few customers were trying to do not-as-small of tasks, leading to out-of-memory errors on their units. Gerp. The developers, optimizing memory usage, were unaware that the systems in fact only had half the specified amount of RAM.
The vendor offered us the additional RAM, no extra charge, but there's still the ~$2000 cost per unit to actually do an upgrade (plus intercontinental plane tickets). The vendor also offered us the option to use their warranty services to get the RAM installed, however we couldn't avail this service: they would only provide for opening up the PC box and putting the ram in, closing the box and connecting the keyboard/mouse/monitor. The PCs were built into another enclosure, and wiring was configured just-so -- I mentioned before, I wasn't fully familiar with the wiring. External ports (outter chassis) had to match internal ports (PC), and if cables (network, or USB, say) were connected to the wrong ports - chaos could ensue. If a port were *not* reconnected how does the vendor test? In the end, we had to either return the units for service or go to the clients' site and service the units on-site.
All because of what I thought was a simple oversight with a preview unit. (I am a *technical*, _analytical_ person, and the feedback I tend to get about that is: "here, take our money!!" It doesn't work great for social scenarios, though.)
* end-of-life, this was a 3 or 5-year supported model (not 7 or 10), relatively modern and recent specs. We designed the product, and then it took us a while to get it to manufacturing, so we were 1-1.5 years into the product cycle before we started shipping. Then, not long thereafter, we had to consider an upgraded base unit.
Honestly
there just shouldn't be a Fahrenheit option.
There only only three numbers: 0,1 and ∞ in IT. As no measure of temperature is useless and an infinite is ridiculous let's just have one. Long ago Fahrenheit and US customary units (ft, lbs and gallons etc ) passed their use by date as might well the jurisdiction(s?) where they still predominate.
In the case of laboratory freezers which might go as low as -60°C, Kelvin might make sense (210 K) but probably silly for everyday phenomena such as weather forecasts outside of Siberian gulags.
Re: Honestly
I'm ancient enough to have been brought up in the UK with both Fahrenheit and Celsius (or Centigrade as was).
I still find it easier to relate to Fahrenheit for higher weather temperatures (100 degrees is very hot) but Celsius for near freezing point!
Re: Honestly
"(100 degrees is very hot) but Celsius for near freezing point!"
Only for the original scale designed by Anders Celsius. Current Celsius scale 100 degrees is boiling point
Re: Honestly
As an engineer in the UK, it always amuses me that "British" units are used in the US, whereas engineers in the UK use SI, unless we're down the pub or buying apples from a market trader.
Every now and then, I have to convert US data into something I can use, which is always a headache, because you can't always trust online unit conversions e.g. I've seen some dodgy versions of BTU/ft/hr into W/m/s, which can cause some interesting problems.
Re: Honestly
The English units are only used in name only for most of them. The actual value, on the other hand...
Re: Honestly
Kelvin has a slight downside of requiring more than one byte to store possible values during normal use (assuming you're not using floating point).
with celcius you can store between -128 and +127 in a signed 8-bit integer, which should suffice for most situations.
And if you decide to store or process the values with an offset, you might as well offset it by 273 and use celcius ;)
Re: Honestly I'm bilingual
Ladies and gentlemen - this is your captain speaking. We will be flying at 40,000 ft at 1000 km per hour.
It's hot today - it is in the 20s...do you remember it being in the 80s when you were young.
One Friday, he accidentally tweaked a setting so the freezers monitored temperatures measured in Fahrenheit instead of Celsius – but didn't change alarm thresholds accordingly.
Surely there should be a code freeze on a Friday