Company that made power systems for servers didn’t know why its own machines ran out of juice
- Reference: 1760942049
- News link: https://www.theregister.co.uk/2025/10/20/who_me/
- Source link:
This week, meet a reader we'll Regomize as "Cole" who told us he once worked for a multinational company that built power systems for servers.
Security nightmare stories needed!
Boss get the company hacked because he taped passwords to his monitor? Coworker get phished by a Nigerian prince?
Share the dirty details and they might appear in a future edition of PWNED, our new weekly feature about the worst security breaches that never should have happened.
Drop us a line at [1]pwned@sitpub.com . Your anonymity is guaranteed.
The company's products worked well, but its factory was a mess.
"For years we had problems with the factory servers shutting down over long weekends," Cole told Who, Me?
Nobody could figure out the reason for the outage, but it was generally agreed that external events that took place over long weekends – things like major roadworks or electricity companies working on the grid – caused the issues. IT staff therefore decided acquiring a bigger uninterruptible power supply (UPS) would improve resilience.
[2]
After that new machine came online the problem disappeared for a while, then re-emerged over the Christmas break.
[3]
[4]
Cole's company decided an even bigger UPS must be the answer.
They were wrong. Even after the bigger batteries came online, servers still slumped over long weekends.
[5]Bored developers accidentally turned their watercooler into a bootleg brewery
[6]After deleting a web server, I started checking what I typed before hitting 'Enter'
[7]Playing ball games in the datacenter was obviously stupid, but we had to win the league
[8]I was a part-time DBA. After this failover foul-up, they hired a full-time DBA
Cole eventually figured out the problem: The switch that controlled power to the company's servers was the same switch that powered his workshop.
As a sensible chap, Cole had been hitting that switch every night as he left work. The UPSes had enough juice to keep the company's servers running overnight during the week, and between Friday evening and Monday morning – but not enough for a long weekend.
[9]
"Since I was first in and last out, the IT people never saw the power down and probably didn't look closely at log files to work out what was happening."
Cole's employer fixed the problem with a sign.
"We labelled the switches clearly with a warning and instructions," he wrote.
[10]
"I never got the blame, in fact our IT guy just shrugged when the cause and solution were described to him."
Have you lost track of which switch controls what box? If so, [11]click here to send us your story so we can share it in a future edition of Who, Me? ®
Get our [12]Tech Resources
[1] mailto:pwned@sitpub.com
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offbeat/columnists&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aPYIOXyWmpVVxEjQeR3mGAAAARc&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offbeat/columnists&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aPYIOXyWmpVVxEjQeR3mGAAAARc&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offbeat/columnists&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aPYIOXyWmpVVxEjQeR3mGAAAARc&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[5] https://www.theregister.com/2025/09/22/who_me/
[6] https://www.theregister.com/2025/09/15/who_me/
[7] https://www.theregister.com/2025/09/08/who_me/
[8] https://www.theregister.com/2025/09/01/who_me/
[9] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offbeat/columnists&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aPYIOXyWmpVVxEjQeR3mGAAAARc&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[10] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offbeat/columnists&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aPYIOXyWmpVVxEjQeR3mGAAAARc&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[11] mailto:whome@theregister.com
[12] https://whitepapers.theregister.com/
How could they not figure out the timing?
Wouldn't you plug something in on a non UPS protected outlet that's logging to a file every minute, so you could figure out when it goes down? Interface with the UPS to determine when it sees its external power going offline?
Had they done even the barest minimum of diagnostics, they would have determined when the power is cut off and it would have quickly become obvious that it is happening when the "last person to leave" has left for the night.
They spent who knows how much on two unnecessary UPS upgrades because no one had the brains or curiosity to investigate anything. Why can't I hammer that nail in? Guess I'll use a bigger hammer, rather than noticing I'm trying to hammer it into stainless steel plate!
Re: How could they not figure out the timing?
Yep. Sounds rather ... unlikely, doesn't it?
Re: How could they not figure out the timing?
At some of the places I've worked in the past, that sounds entirely possible. Actually, the older the factory the more of a nightmare electrical systems become. The term "Grew like Topsy" comes to mind.
Re: How could they not figure out the timing?
>>"Grew like Topsy"
*grow'd
Original is "...I 'spect I grow'd."
/mines the one with Uncle Tom's Cabin in the pocket... you did use quotation marks after all!
Re: How could they not figure out the timing?
Absolutely!
As we expanded we took over the adjacent buildings on either side. Not wanting to incur three utility bills, also needing a substantial three-phase power supply, we organised a new single supply and fed all three units from the same source. The interconnections were such that breakers were in one unit controlling the supply in another. The 'labelling' was poor and I spent considerable time sketching the scheme as it was, to ease diagnosis by those that needed to know.
Of course, the electricians who came in to add a minor circuit ignored all this and because they couldn't get into one cabinet, which was locked, padlocked and had a sign stating 'Isolate in Unit 2 before opening this door', I found them prising the door off its hinges in order to gain access to what they assumed was an 'unused junction box'. They had even taken the handle off the local door-mounted isolator as it was preventing the door from release. Instead of looking at the 'scheme' they had 'followed the cables' and made a mistake where the ducts entered the building.
They thought a pair of four-core, 120 mm2, three-phase, armoured cables would be the very thing to power an outside light.
Re: How could they not figure out the timing?
Currently in 2 adjacent units which we mainly use for warehousing. I'm not sure what the previous occupants did, but when we had the electrical circuit inspection done when we moved in, we discovered the building has 2 x 100A 415V 3-phase supplies which then split out to 9 different circuit breaker boards around the building and we had a total of 108 circuits to be inspected. I suspect the largest part of our electric usage is for lighting in the warehouse and offices, even with LED lights installed everythwere
Unfortunately the 3 things we have which needed 3 phase power couldn't be sited near where there was existing power so we had to have more 3 phase distribution cabling installed for our electric forklift charger, car lift and EV chargers. When the car lift and forklift charger were installed (2 separate installations) both times the electricians managed to take out all of the power to the offices on the opposite side of the building because a couple of the distibution panels are daisy chained.
Re: How could they not figure out the timing?
This has nothing to do with nightmare electrical systems.
They had the system logs, thanks to various values of UPS. As such, they knew exactly when the power went down. Extrapolating the why from the when has been child's play in every RealLife scenario I've ever been a part of.
Re: How could they not figure out the timing?
Interface with the UPS to determine when it sees its external power going offline?
Most UPS manufacturers will charge you extra for the network card that allows you to use SNMP (or whatever) to monitor the UPS. Although, it's not exactly expensive compared to the cost of the UPS, but given the level of joined-up thinking on display in this story, I wouldn't be surprised at someone OKing spending ££££ on a new UPS, but balking at spending the extra hundred quid to monitor it properly...
But thanks for reminding me I need to chase up with Eaton why one of our UPS's just dropped offline at 57% battery during a recent power failure, it's been a couple of weeks since they last replied, I think they're stumped.
Re: How could they not figure out the timing?
> Most UPS manufacturers will charge you extra for the network card
Or the trivial piece of cable that will connect the USB data lines from the deliberately non-standard socket on the back of the cheaper units (hint: after you've found online the connections needed, when you spend the couple of minutes to make your own frankencable - it really is just a cable - connect ground and data but *don't* connect the +5V line on the USB, it probably isn't!).
Re: How could they not figure out the timing?
Many years ago I worked somewhere that had a program doing exactly that kind of logging. It sat on two machines, one on the UPS protected power and one no. Both produced identical versions of the report except that one would stop when the power to the rest of the building went out. No one reslly read the reports they were each stored in an indiviual text file, so there were 1,440 files in every 24hr period.
One day however someone did have a look because they’d just taken over as department head,technology. They wanted to see what this small form factor box they’d found labelled reports actually did. For the most part it was standard stuff inside the report about power being nominal, network connection okay, system uptime etc. There was also a curious line about number of administrators on system.
He queried it with a senior IT staff member who didn’t have a clue, nor did anyone else. The program that produced the reports was written in house so he looked up who had done so. Dismayed however to discover that the author had retired about a year before. Someone still had his number so the pensioned techie was duly called and he explained that he’d added that as a little security twist.
If the number of accounts with admin privileges increased without anyone else being officially added then they could investigate further. What nobody knew and he explained, was that in the program there was an option, (not enabled) that allowed notification of this by email. Also there was an export to Excel function so you could select a period of reports and export the data. All very clever and sadly not really being used.
It was two boxes that had the reports prog on them this was so that if the main power went out, the non UPS protected box would stop reporting. The one day I was there and the mains went off for 20mins overnight, We had a record of everything.
"Must be nice to work at a company where issues are solved by throwing money at it."
Throwing money at the wrong thing(s), then saving money by erecting signs instead of getting an electrician to put in a dedicated feed for the servers.
I've done that at several customer premises - nothing as drastic as server power, just power to sensor lights in entry corridors to save people breaking their necks if they needed to come in after dark. Massive "DO NOT SWITCH OFF!" signs in bold red lettering. You think people noticed?
That is why you use tape to set switch "permanently" to ON.
Duct tape. The universal band aid to all life's problems.
Locking MollyGuards.
Available at a sparky supply shop near you; usually under $CURRENCY20 each.
You have to feel sorry for [1]Molly , she is now immortalised as the "thing that has to be guarded against".
[1] https://www.hackersdictionary.com/html/entry/molly-guard.html
I've worked at companies like that! Often it's been when something could be quickly bodged in-house to work round the immediate problem.
"I need a shelf for..."
"No problem, we'll make you one!"
This ignored the fact that plastic bending and welding the shelf cost 3 or 4 times as much in materials plus the labour than it would to have just bought something from the DIY shop over the road.
A case of oops rather than UPS
Sorry, couldn't resist. I'd better be going
Re: A case of oops rather than UPS
That was a powerful joke
Re: A case of oops rather than UPS
The sort of thing that could spark a pun-off: a current trend at El Reg.
Re: A case of oops rather than UPS
We Volted at the chance to have some pun
Perhaps the IT folk weren't entirely in the dark
You cannot have a too big UPS but selling manglement on that might be tough. :)
So if Cole inadvertently and unknowingly provided a pretext for a UPS capacity upgrade (or two) discretion might have been the better part...
In most contexts obtaining an allocation of resources to solve an existing problem is always several orders of magnitude easier than getting even an in principle allocation to prevent a critical problem arising in the first place.
It's wild to me that the likely cause of outages was identified correctly, but no one bothered to find out when and how they occured!
Even if it was an issue with the grid, i'd personally have a chat with them about our power dropping for extended periods every weekend .
Must be nice to work at a company where issues are solved by throwing money at it.