News: 1675413071

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

No, you cannot safely run a network operations center from a corridor

(2023/02/03)


On Call With a whole month of 2023 already consigned to history, The Register brings you another instalment of On-Call, our weekly column in which readers share their stories of past deeds performed in the service of keeping computers copacetic.

This week, meet "Nick" who once ran the Japanese branch of a storage-as-a-service provider back in the early 2000s.

"We installed and managed disk arrays and storage area networks at customer sites and in datacenters," Nick explained. In these pre-cloud days, the as-a-service part involved 24x7 monitoring from a network operations center (NOC) in the US.

[1]

"We also had a Japanese partner who wanted to run a NOC in Japan, so we helped them get one set up," Nick recalled.

[2]

[3]

As you'd expect, that facility needed a rack full of kit – which Nick's firm provided. And as Nick was the chap in charge, he was asked to inspect it.

"When I got to their NOC to see how things were progressing, I found this rack sitting in a hallway, with a network, hard-wired frame relay line, and power connection on the wall."

[4]

Nick pointed out that the rack really needed to be in a secure area – but that advice was dismissed. He was assured that the Japanese partner was confident that if they told their reliable and honest staff to leave the rack alone, no harm would befall it. Because of course.

[5]User was told three times 'Do Not Reboot This PC' – then unplugged it anyway

[6]New IT boss decided to 'audit everything you guys are doing wrong'. Which went wrong

[7]This can’t be a real bomb threat: You've called a modem, not a phone

[8]Don't lock the datacenter door, said the boss. The builders need access and what could possibly go wrong?

Early in the morning a couple of months later, Nick was called into action – the NOC had disconnected from the network.

"After lots of running around and troubleshooting, we discovered that the frame relay line coming from the wall had been ripped out, then just stuffed back into the junction box."

None of the reliable and honest staff who had been told to be careful near the rack and leave it alone admitted to the deed. Because of course.

"They did move the rack to a secure location, though," Nick recalled.

[9]

So at least he eventually got that message across.

Has your advice been ignored, leading to utterly predictable problems? If so, [10]click here to send us an email and we'll try to share your story in the future. ®

Get our [11]Tech Resources



[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2Y9zpUAKUh8ZqG4OYAYHIUAAAAMA&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Y9zpUAKUh8ZqG4OYAYHIUAAAAMA&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33Y9zpUAKUh8ZqG4OYAYHIUAAAAMA&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44Y9zpUAKUh8ZqG4OYAYHIUAAAAMA&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[5] https://www.theregister.com/2023/01/27/on_call/

[6] https://www.theregister.com/2023/01/20/on_call/

[7] https://www.theregister.com/2023/01/13/on_call/

[8] https://www.theregister.com/2022/12/23/on_call/

[9] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33Y9zpUAKUh8ZqG4OYAYHIUAAAAMA&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[10] mailto:oncall@theregister.com

[11] https://whitepapers.theregister.com/



UCAP

I remember once (about 25 years ago or so) having to go over to Japan to witness some factory acceptance tests (ultimately successful, but not without a major screw-up on the way), before ending up in Tokyo to witness a site acceptance test for a new satellite NOC. These latter tests where not successful in any respect; in fact I think that I failed them on just about every item we looked at. Basically they where woefully unprepared, and I suspect where expecting me to just sign on the dotted line rather than actually check things.

One of things I realised from this trip, however, is to be careful not to make the situation worse when dealing with the Japanese. The loss of face was already considerable, there is no need to push them deeper into the quagmire of shame. Because of the way I handled things, they specifically asked me to come out a couple of months later to rerun the tests, with results that meant that not signing on the dotted line was out of the question.

Losing face

ColinPa

I had to go to an Asian country to help diagnose a performance problem - they wanted to go live the next week and it was not performing. The first thing I spotted was the applications still had debug code in them - they were writing out about 1000 printf's a second. Rather than tell them the problem, I showed the local team how to use the tool - the problem was obvious. The local team spotted the problem and went away and got it fixed. I then spotted they had some bad SQL statement's which could not be cached. Overnight they applied fixes, and by my second day the throughput had doubled!

I had to go to an all hands meeting on my second day, for a project status update. There was the bank staff, the staff from the company providing the systems, me, and a local person from my company. The senior project manager stood up to speak. I could not understand what they said - but he was clearly very upset and every one else dropped their heads in shame as he berated them. I was told later the manager said things along the line of "You have been working on this performance problem for three months, and made no progress. This person comes from the UK - and in the first day finds problems and has doubled the throughput - what have you been doing for three months!!!!!! You told me it could not be fixed...." My local contact was embarrassed having to explain the rant from the senior project manager.

Huh?

chivo243

Where's the rest of the story?

Did you forget the part about the cleaners??

Re: Huh?

LogicGate

Cleaners unplug the cable. A passing cart full of heavy stuff will grab the cable and rip it out of the junction box. Regrettably, any believable argument for this having gone unnoticed is made null and void through the action of stuffing the cable back in again.

Small loss of face avoided through the application of big loss of face.

Re: Huh?

chivo243

At a place I worked at, the cleaners were charged with waxing the floors, all of the floor... Copiers, printers and vending machines(you read that right!) were moved with out unplugging the network cable... I got to repair the RJ45s in the wall and replace the network cables. I was also tasked with finding physical solution for preventing these unfortunate 'accidents'. My first thought came right out of Simon's rantings...

Re: Huh?

H in The Hague

"I was also tasked with finding physical solution for preventing these unfortunate 'accidents'."

In the UK, gas cookers need to be fitted with a safety chain so they cannot be accidentally moved, ripping the gas hose off.

https://www.screwfix.com/p/cooker-stability-chain-hook/30143

Years ago I worked on a piece of scenery which was occasionally rotated by hand on a slewing ring (well, a VW Golf hub from the local breakers' yard). That had a power cable feeding lights in the unit (no battery powered lighting and wireless DMX in those days). Although the operator knew when to stop the rotation we were worried that others might play with the unit and damage the cables, so I also fitted a wire rope, slightly shorter than the cables, to limit rotation. And the slewing brake was built using the jack of my first car, an Austin Maxi.

Here's one for the weekend -->

Re: Huh?

Julian Bradfield

It's usually said to be to prevent the cooker tipping forward.

Have these bureaucrats ever tried to tip a 100kg range cooker? (Or indeed accidentally move it.)

Re: Huh?

Flightmode

Tangent, but on the topic of believable arguments, I present you with an unbelievable one from real life. I swear, these guys must have had Creative Consultants on staff.

In the early 2000s, I worked for an ISP in mainland Europe. All our international circuits (at the time an impressive mix of STM-1 and E3!) were leased from one provider that had serious quality issues. I'm not sure if it was poor documentation or crappy hardware, but we had outages in various parts of the network pretty much every day. There were several months where our SLAs kicked in and we didn't have to pay them anything at all for their services, that's how bad it was.

Anyway, after a longer-than-usual outage between Amsterdam and Paris we got a root cause analysis back. It basically explained that over the weekend before, there had been a riotous demonstration at a train station somewhere in Belgium. A large group of people had stormed the tracks and vandalized the buildings[0]. This wasn't in and of itself the cause of the outage, oh no; one of the rioters had as part of his protest jammed a crowbar into the train tracks. A couple of days later, a commuter train[1] entered the station on that track, stopping right over the crowbar which buckled and bent under the weight of the train. When the train rolled out, the force on the crowbar was released, and as it snapped back into its regular shape it launched in a high arch - skipping over two other platforms - and hit the provider's junction box at pretty much the other end of the trainyard. It struck the door at exactly the right point for it to buckle inwards and break only our patch connection. All other cables in the box were undamaged, only our circuit had an outage.

Anyway, the provider is no longer around. A couple of bankruptcies[2] and buy-outs later, they were absorbed by a national carrier's ISP arm and probably dissolved there. Good riddance.

Icon for rallying crowds.

[0] Not sure what about, we could never find anything in the papers on this.

[1] They were very insistent on this being specifically a commuter train, not just any train, also in follow-up discussions. Still not sure why this was important.

[2] Their head-office was in Brussels. One of their bankruptcy filings came when I was in town for a two week-training, so I was basically asked by my mangement to stay in Brussels and more or less force my way into their NOC if their staff decided to walk out to monitor our circuits in their absence. As luck would have it, I didn't have to.

wolfetone

" He was assured that the Japanese partner was confident that if they told their reliable and honest staff to leave the rack alone, no harm would befall it. "

Reliable and honest staff: Hold my sake.

chivo243

Hold my sake. There are redneck Japanese?? Who knew!

Time to make another cuppa... and grab a towel or two

One time

elsergiovolador

That one time, the owner of the company got really upset that once again someone has restarted the server, so he yanked it out of the server room and put it on a desk next to him.

Mind you this was a small open plan office and many know the noise a server makes when it boots up and often afterwards.

Turns out the fan noise that one was giving out was unbearable and everyone complained.

Blasting the latest glam rock tunes to cover it up didn't help.

After two or three hours, the owner took the server back to the server room and was working on the floor by the door keeping an eye on it.

Then he realised nobody restarts it. It had a fault with cooling and overheating was what was causing the reboot.

Well, I have done sparc assembly in my time (remember Dave Sitsky and
I did a port of the kernel to the ultrasparc running in 32-bit mode
before you did the sparc64 port) but the stuff you're doing in there
isn't just assembly, it's magic assembly. ;)

- Paul Mackerras admiring Dave Miller's assembly on linux-kernel