After three weeks of night shifts, very tired techie broke the UK’s phone network
- Reference: 1742801414
- News link: https://www.theregister.co.uk/2025/03/24/who_me/
- Source link:
This week, meet a reader we’ll Regomize as “Wayne” who in the 1980s maintained the specialized “System X” hardware that ran telephone exchanges.
Bad things happen when phone exchanges go down, so the System X machines Wayne maintained were installed in pairs named Processor 1 and Processor 0. Processor 1 was the active device and Processor 0 was the hot standby.
[1]
One evening, Wayne and his colleagues were asked to upgrade a major phone exchange that handled all the traffic for the North West of England and connected that region to the south of the country.
[2]
[3]
Upgrading the machines required them to be powered down so new hardware could be installed. System X’s designers deliberately made that hard to do, to prevent accidental shutdowns.
To perform an upgrade, Wayne therefore needed to type commands into a terminal to put components into a state of “soft shutdown” which was indicated by a green light turning red.
[4]
Once the lights on all components in a Processor turned red, it was safe to turn it off and replace hardware.
At the time of this job, Wayne had been doing similar upgrades for about three weeks and they all required pulling an all-nighter. Wayne’s routine saw him roll into work at 21:00 to start preparation, before the serious part of the upgrade started at around 02:00 the next morning.
After three weeks of overnight upgrades, Wayne was a mess.
[5]
He was also disconcerted, because for reasons he never understood Processor 1 was always installed to the left of Processor 0. Wayne felt that Processor 0 should be on the left, and Processor 1 on the right to match the right-to-left convention of western numbering and writing.
You can guess the rest: After performing a soft shutdown of Processor 1, Wayne’s discomfort with left-to-right numbering collided with his extreme fatigue, and his brain decided the green lights on Processor 0 were an invitation to remove its components.
Processor 1 was now in a state of soft shutdown and Processor 0 was missing the hardware that made it work.
“All trunk telephony traffic between the North and the South of the country went down,” Wayne admitted to Who, Me?
Wayne still isn’t sure why alarms didn’t start going off.
And because they didn’t, he kept doing the tests he always did at this point of the upgrade. All his previous jobs had gone fine. This time his terminal showed all sorts of anomalies.
Eventually, his tired brain recognized the errors he was seeing and remembered the last time he’d seen them: At a training course during which he’d practiced shutting down an entire telephone exchange.
[6]Developer wrote a critical app and forgot where it ran – until it stopped running
[7]Junior techie rushed off for fun weekend after making a terminal mistake that crashed a client
[8]Techie pulled an all-nighter that one mistake turned into an all-weekender
[9]Untrained techie botched a big hardware sale by breaking client's ERP
Panic lanced Wayne’s brain.
“I rushed back to the processor and started slamming cards back in,” he told Who, Me? “The lights stayed red a frustratingly long time but then slowly started to turn green.”
He raced back to the terminal. “My sweaty fingers slipped over the keyboard as I typed the recovery commands as fast as I could, desperately willing the status to improve.”
Over the next few minutes, more lights turned from red to green and he was confident phone calls were again making it across the UK.
As if to prove it, Wayne’s phone rang. He picked it up heard colleagues at the network monitoring center ask what had just happened.
“I could have lied and said I didn't know but I couldn't think of anything clever, so I admitted my mistake,” Wayne told Who, Me?
“We'll have to report this,” he was told.
And then, in the wee small hours, with weeks of fatigue in his bones, brain and blood, Wayne drove home and flopped into bed.
With his head full of existential dread about what this incident meant for his career, sleep would not come. “I just lay in bed, staring at the ceiling, waiting for morning and the inevitable phone call from my boss,” he told Who, Me?
Later that day …
Wayne eventually roused himself and the phone rang.
The call went like this:
Boss : "How are you? I believe you had a long night."
Wayne : “I’m sorry, I was tired. I know it's no excuse."
Boss : "What are you talking about? I thought you were upgrading a trunk exchange last night?”
Wayne : "Yes, but ..."
Boss : "Some contractors dug through a load of fibre by accident next and completely killed transmission."
“And that was that” Wayne told Who, Me? The network monitoring center intended to report his mistake, but the cable cut caused a much bigger outage that had the same effect as the one Wayne had created. His mistake was never mentioned again and had zero consequences.
Has somebody else’s mistake hidden your mistake? Don’t make the mistake of not sharing your story: [10]Click here to send us an email so we can tell your story in a future edition of Who, Me? ®
Get our [11]Tech Resources
[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2Z-E7WcSfJO5OfN3j-xVPJgAAAJg&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Z-E7WcSfJO5OfN3j-xVPJgAAAJg&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33Z-E7WcSfJO5OfN3j-xVPJgAAAJg&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Z-E7WcSfJO5OfN3j-xVPJgAAAJg&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33Z-E7WcSfJO5OfN3j-xVPJgAAAJg&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[6] https://www.theregister.com/2025/03/17/who_me/
[7] https://www.theregister.com/2025/03/10/who_me/
[8] https://www.theregister.com/2025/03/03/who_me/
[9] https://www.theregister.com/2025/02/24/who_me/
[10] mailto:whome@theregister.com
[11] https://whitepapers.theregister.com/
I'm pleased his Boss wasn't a Wayne Kerr about it
You can always rely on diggers...
...to search and destroy underground fiber!
Re: You can always rely on diggers...
We had some building works going on at our company. They invested time and effort in tracing all the ducts across the affected area.
D-Day came and they started digging. When telephone lines & data circuits went offline the builders denyed it was their fault as they were following the plans which avoided all the ducts, enthusiastically proding their site plans showing all the traced ducts.
BT turned up and pointed to the fragments of duct & cable in the spoil.
I think the lawyers enjoyed that one.
Re: You can always rely on diggers...
I've seen a similar incident. They'd even had a surveyor out to mark all the lines and there were neat little flags every meter or so indicating where comms lines, water, etc were located. They start digging and find several phone lines... well damn, best leave this alone while the right parties are notified so start digging over there. Second or third scoop in they nick (but don't cut) several large power lines feeding several buildings. Pants now thoroughly browned things are halted.
Turns out the maps the surveyor used and the actual reference points the surveyor used were not the same. So all his neat little flags were over a meter off and the holes marked for "dig here" basically all exactly over the underground infrastructure they were supposed to miss.
Re: You can always rely on diggers...
In a previous job some very old "temporary" buildings (that had been around for more than 30 years) were being knocked down to be replaced with shiny new offices. All the network cabling for half the site (approx 700 employees at that point in time) ran across the roof of those buildings. So before any demolition or building work was started, time and effort was spent to bury all those cables safely underground instead.
Can you guess what happened on the very first day of the demolition?
I blame the boss!
Pulling all nighters for three weeks is clearly not going to go well! On top of that, doing something critical solo...
It was inevitable.
Came here to say exactly the same!
Ditto
Working long shifts without breaks will results in bad things sooner or later - I recently read the public inquiry report on the Clapham Junction rail disaster, and one poor guy pulling overtime (like they all did) changing signal relays was a substantial part of the causal chain.
Re: Ditto
At Clapham a larger and more terrifying issue was that no-one involved had ever been properly trained in what they were doing. It was all passed on 'man to boy' without formal training or qualification, and this extended to all levels of the organisation. At the inquiry the workers and their supervisors claimed absolute ignorance of standard practices used in any safety critical wiring trade and mandated by British Rail in its standards documents - things like technicians performing wire counts at the end of their work and those then being checked and confirmed by supervisors.
The redundant wire that caused an incorrect proceed signal to be shown had been unscrewed from a terminal and lightly bent out of the way. When it slowly bent back it made contact with the screw terminal and disaster happened. At the inquiry the tech could not see how what he had done might not be in accordance with a safe system of working. Literally incredible.
Recipe for disaster
Not just alone, but they expected him do the most critical part after 2 a.m. The [1]Window of Circadian Low is roughly 3 a.m. to 5 a.m. That's when one's most likely to cock it all up due to drowsiness.
[1] https://skybrary.aero/articles/window-circadian-low-wocl
Re: Recipe for disaster
At least he wasn't working in a Soviet nuclear plant running an experiment...
Pulling all nighters for three weeks is clearly not going to go well!
That pattern of working is, as the title of the article says, a night shift pattern. Once you've had a couple of days/nights to recalibrate then it's not a million miles away from doing a day job, apart from the darkness (probably a moot point if working in a comms bunker or somesuch, and the lower natural ebb after 3am).
Far better than some of the working patterns I've had in earlier years of my career which involved things like normal workdays punctuated by night working every few days or, in some cases, normal workday followed by working through the same night in a 24-30 hour shift.
“...Wayne felt that Processor 0 should be on the left, and Processor 1 on the right to match the >>>right-to-left<<< convention of western numbering and writing."
"You can guess the rest: After performing a soft shutdown of Processor 1, Wayne’s discomfort with >>>left-to-right<<< numbering collided with his extreme fatigue..."
Have I been reading things wrong for more than 60 years? I learned to read and write left-to-right and have never had any discomfort with it.
Maybe the article was written during an all-nighter?
Not all writing systems are left to right. Some, such as Japanese, can go either way, seemingly at a whim. Generally modern texts are left to right and traditional right to left but some newspapers are also right to left. And sign-writing on vehicles is front to back, so that's left to right on the left and right to left on the right... Admittedly the idiomatic characters contain more individual meaning than our Roman letters so there is a lot more context with which to figure out which direction makes more sense.
I think "convention of western numbering and writing" covers that.
Ex System X tech here. I worked on System X as an apprentice, a tech and into management
Wayne was working on a DMSU - the now obsolete layer of switch handling trunk calls, no direct connection to end customers. All the DMSUs were interconnected with each other, deliberately. There's no conceivable scenario where the loss of an exchange prevents traffic flowing 'north to south' or in any other direction. Every local exchange is connected to more than one DMSU. All these interconnections are duplicated via different physical routes to prevent a single JCB bringing tings down. If a DMSU disappeared from the network no-one making calls (apart from people already on calls through that specific exchange) would notice.
The network management centre would very much notice though. Instantly. They'd know what had happened and why. Their admin tools would report that Wayne (him, specifically) had used man machine language (MML) commands to disable one side of the exchange and would then report that the other side had been depowered. Every large exchange has an 'out of area' line - a telephone line connected to a different exchange and sent over long cables, so that the site can be reached and people there can call for help even if it's all gone very wrong. That line would be ringing quite insistently.
I think this story has been embellished for dramatic effect.
There's no conceivable scenario where the loss of an exchange prevents traffic flowing 'north to south' or in any other direction
Thirty years ago, we lost a shed load of circuits. It transpired they all flowed through one exchange in Birmingham which was a core node in north/south traffic.
I think this story has been embellished for dramatic effect.
Many of these mea culpa stories (Not just here on El Reg) are either embelished or flat our lies to claim something theye didn't do or didn't happen for their five minutes of fame.
I remember hearing one on Simon Mayo's radio show where the claimant said they were responible for putting up road signs that caused traffic chaos. They weren't. I know who did it and that person did not write into Simon Mayo's radio program ;)
Communication is important
I was a trainee so given the overnight shift to watch over a system which was on a two-week proving run. This installation was part of a linked system and I was given the middle section with a more experienced guy who had brought his caravan as his own 'welfare provision'. I had to sit in the control room whilst he sat in the caravan, probably thinking deeply. Nothing happened for several days until severe winds blew down the radio-link between stations, coinciding with a digger finding the new telephone lines. We hadn't been in touch with the other sites before so things carried on as normal........ Until the mains power-transformer failed catastrophically at 2:00 a.m. leaving us in near total darkness and isolated deep in the middle of nowhere. It was snowing.....
We needed to alert the other stations on the line. Raising my supervisor from his deep thoughts, I was sent off-site to find a telephone box.... This was long before mobile phones became popular.
The snow was getting deep and time was running out. I did find a working telephone box in a village a few miles away and called through to the 'command centre': "Just who are you?" they asked, not unreasonably. Eventually, I did convince them this was not a joke and they set in process appropriate action. It was a big interruption though; we reassembled a few weeks later to restart the two-week proving run. Fortunately, I had moved on in my training schedule.....
Re: Communication is important
> Until the mains power-transformer failed
You don't work at Heathrow do you?
Waynes 100 quid bung helped me pay that months payment on my JCB. Lost a nights sleep though.
That story will be hard to BT