Techie ended vendor/client blame game by treating managers like toddlers
- Reference: 1757057466
- News link: https://www.theregister.co.uk/2025/09/05/on_call/
- Source link:
This week, meet a reader we'll Regomize as "Warren," a network engineer who found himself working on a project team that was trying to clean up a big stack of open tickets that had accumulated during the merger of two hospitals.
Some of the tickets were easy to fix – users would tire of waiting and just buy a new mouse to replace malfunctioning machines – but the one that landed on Warren's desk was more challenging, technically and politically.
[1]
"They 'airdropped' me into the middle of an ongoing discussion between the hospital and a vendor regarding the sub-par performance of a 3D scanner used to detect breast cancer," he told On Call.
[2]
[3]
"When doctors viewed scans, they would sometimes appear in two seconds, and sometimes appear in 45 minutes," Warren wrote. When times ballooned to the larger figure, it caused all sorts of nasty scheduling problems as appointments ran past their allotted time.
Despite that dire situation, the argument between the vendor of the storage system that held the scans and the hospital dragged on for months.
[4]
Warren started his attempt to sort this out by researching the network connecting the scanner, the storage device in one of the hospital's datacenters, and the medical suites where doctors tried to view the scans. He found robust 250 Mbps links and a two-hop connection across a small city.
He then joined a conference call to discuss the matter and found it depressingly familiar.
"The storage vendor blamed the client's network, and the client blamed the vendor's equipment," he told On Call.
[5]
Warren next went looking for network errors by analyzing logs for each device involved in carrying packets from the datacenter to doctors. He found nothing.
The only thing he could think of was asking staff to report the odd delays as soon as they appeared.
Which is why his phone rang the next day and Warren sprang into action, logged onto the router closest to the scanner, and found... nothing. But when he looked at the next device in the chain – a switch – its logs listed line after line of errors.
"It was spitting out a lot of garbage," Warren told On Call.
He asked around to find out who ran the switch and learned that the storage vendor managed the machine.
[6]Techie fooled a panicked daemon and manipulated time itself to get servers in sync
[7]Basic projector repair job turns into armed encounter at secret bunker
[8]Sysadmin cured a medical mystery by shifting a single cable
[9]Tech support team won pay rise for teaching customers how to RTFM
Warren knew this finding meant he would need to make another conference call, so he created a PowerPoint presentation that explained the network topology and displayed the relevant bit of the error logs to prove the source of the problem.
"I realized the managers were the sort of people you need to talk to as if they were toddlers to explain things," he told On Call.
That approach worked. Warren's bosses did the sort of things people do when they are impressed by clever people presenting irrefutable evidence. Representatives of the storage vendor could not deny culpability and remained speechless.
Warren decided to take charge and asked the storage vendor if they had ever tried the most obvious fix – replacing the network card on their array.
A week later, with a new NIC in place, the problem disappeared. The old card was clearly faulty, and, for six months, nobody thought of checking it despite the serious problems it created for patient care and hospital operations.
"That was probably one of my proudest moments in IT," Warren told On Call.
Has PowerPoint helped you to, er, make a point? If so, [10]click here to send On Call a slide email so we can transition to your tale on a future Friday. ®
Get our [11]Tech Resources
[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aLq0t8q_b6rd0JH_fXqSxAAAANc&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aLq0t8q_b6rd0JH_fXqSxAAAANc&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aLq0t8q_b6rd0JH_fXqSxAAAANc&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aLq0t8q_b6rd0JH_fXqSxAAAANc&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aLq0t8q_b6rd0JH_fXqSxAAAANc&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[6] https://www.theregister.com/2025/08/29/on_call/
[7] https://www.theregister.com/2025/08/22/on_call/
[8] https://www.theregister.com/2025/08/15/on_call/
[9] https://www.theregister.com/2025/08/08/on_call/
[10] mailto:oncall@theregister.com
[11] https://whitepapers.theregister.com/
Passing the buck is typical human behavior
Those on each side feel like they have enough on their plate without investigating an issue where they are guaranteed to find nothing, should it turn out to be the other guy's fault. Game theory dictates that neither should put forth the potentially futile effort unless something happens to disturb the equilibrium - either it is costing the hospital too much money from being able to charge fewer patients for expensive scans, or the vendor feels they are risking future sales if the hospital is unhappy with the quality of the product/service being provided.
Or someone who sits outside that standoff and takes pleasure from the simple act of solving a mystery is brought into the mix.
Who to blame?
Vendor: "I believe the issue is on your end".
Me: "Have you looked into it? I believe it may be on your end because of reason X and Y."
Vendor: "We will look into it if you 100% irrefutably prove it's not on your end"
I've had this discussion a thousand times... Bless vendor support that actually tries to cooperate.
Re: Who to blame?
Me: We have conclusively proven that there is packet loss introduced between router F and router G, both manufactured by you and directly connected with a DAC copper cable delivered by you. Can you help us troubleshoot this issue?
Vendor: It seems that the server hooked up to router A in your data center is connected with a third party SFP. Before we can troubleshoot this issue further, we're going to need you to replace that SFP with a Genuine Vendor Ultra Plus brand to ensure that the signal quality is good enough for the path.
Me: ...that's not how any of this works?
Vendor: Case is now Cust-Pending.
Re: Who to blame?
Customer: we will replace your kit with something from a reliable supplier, please arrange to collect yours and close the service contact. Or, as an Australian client of ours once memorably informed us after 4 months of fruitless remote debugging, "if I don't see an engineer here by the end of the week, your kit is going in the river!"
4 days, one expedited visa, and a very expensive flight later, I was in his office, very jetlagged, on Friday morning. Took me a week to find and fix the (genuine) bug.
Re: Who to blame?
Had that years ago at a company. System/36 and a /38, all IBM kit including some of the new fangled PS/2 range :)
Had a problem.
IBM: The PC's are not valid
Me: They are all IBM
IBM: the emulator card is not ours
Me: All emulator cards are IBM and all the software is IBM's - no Rumba
IBM: Your network is at fault
Me: All our MAU's are IBM
IBM: Your cabling is not to standard
Me: Nope, it is all IBM
IBM: Oh.....
This was the days where my weekends were often spent with long arse bits of curtain rail, roles of tape and lots of shouting of "pull it","push it" in stupid conduits. So glad when we moved,but with rose tinted glasses they were fun days
> When times ballooned to the larger figure...
Giggity.
Modern networking tends to be reliable and fault tolerant so people have either forgotten, or aren't aware, of how things were "back in the day".
Touching wood I can't think of how long it is since a single network interface failed but in the days of thin (10 base 2) ethernet it wasn't unknow for us to have to go round an entire segment systematically disconnecting individual machines until we found the one with the failing NIC (and that's after first changing the terminators at each end as they were also known to fail). The difficulty was that depending on the failure it could impact the entire segment, not just the one machine.
The plus point though was were were dealing with desktop machines with separate NICs*, not with network interfaces integrated into the motherboard, so it was generally just a case of swapping the failing card for another one (invariably NE2000 based), checking the interrupt settings and installing the driver.
*I think the last failure I had on a laptop the only solution was to disable the internal networking and use a USB - Ethernet adapter.
Aah yes, 10 Base 2, a.k.a. the distributed single point of failure.
Back in the late 1980s, there was a company in Taiwan which "recycled" MAC addresses on its clones of NE1000/2000 ethernet cards. When you got a new batch of cards which matched the MAC address of one or more cards on your existing LAN[0], much hilarity ensued. As a consultant, the first time was the worst ... after that, the symptoms were fairly obvious. I ran across the problem at probably a couple dozen small companies between '88 and '91ish, and then again (!!) in the mid-late '90s, when people started recycling old Netware kit for Windows networks at home.
[0] An "impossible event", at least according to Novell and IEEE.
Yep, had customers with exactly that. The most memorable being when roughly half the adapters in the batch had the same MAC. Oh how we laughed...
Ping pointed the finger
I was involved in a set of long standing problems with a customer. We were a networking product, and the customer complained that periodically the throughput dropped almost to zero - and by the time they came to look at the problem it got better.
I think the customer found our support teams friendlier than their own support team.
I got them to have a simple program which every minute did a ping to the the remote end, and capture the response time.
They sent me the output, and I could see normally good response times with the occasional spike.
Our emails crossed mine said "did you have a problem at 0713" theirs said "we had another occurrence at 0713".
The customer asked me to present to their network support team, and it took about half an hour.
After we had finished we got push back from them "we do not see any errors/problems in our logs"
me: "What response times were you capturing for this time period"
tap tap tap.. "ahh we are just going on mute"
Eventually they said "we don't have any monitoring enabled on that part of the network" - which is why no errors were reported!
The networking people found the problem and the throughput problem was resolved.
The people I was working with still kept the ping every minute.
Re: Ping pointed the finger
< "There's nothing in the logs"
> "Yes, but unfortunately having the logs turned off doesn't fix the problem we've got"
Blame the vendor
I don't understand companies that behave like that. I'm a freelance programming consultant. When one of my customers calls to complain about the behavior of one of my programs, I'm damn well going to find out what's going on and what I can do about it. If I do find out that the customer's network configuration is the basis of the problem, I'm not going to lay blame, I'll just explain the situation and what I can do to fix it.
In an entirely different domain, my shower has recently developed trouble in getting hot water. The best I could get was a tepid shower. I called the artisan who installed the equipement and asked him to come over and and fix the issue. He was there the next day and, following my explanations, had already pretty much figured out the problem. He had come with a replacement piece of equipment but, instead of just replacing the faulty piece, he audited the entire bathroom. From that, we found that the hot water balloon was making the water too hot which, in turn, was dilating the joint in the shower's temperature dial which, in the end, was rendered incapable of delivering the desired temperature. We agreed that, apart from replacing the faulty piece of equipment, he would also set the balloon to a lower temperature that would avoid creating problems in the future.
What I want to demonstrate here is that a true professional is going to do his best to ensure that the customer is satisfied when he leaves. It requires dedication and knowledge of all the areas in connection to the one you're called in to work on, but such a person is priceless.
Companies are not priceless. What does that say about where we're going ?
Re: Blame the vendor
I guess that the main issue is that you're incentivised by customer satisfaction and the resultant word-of-mouth recommendations that might lead to more work. The support bod on the other end of the phone is more likely to be incentivised by making sure that tickets are closed** or pinged back to "waiting customer" as quickly as possible.
* "closed" doesn't have to mean "fixed", of course.
Re: Blame the vendor
Hot water... balloon?
Three languages
And I'm not talking about programming languages, where most of us are fluent in half a dozen or so.
1: Regulatorian: This is the language of politicians and lawyers. It sets the mandates on banks, hospitals, schools etc. It contains nuances and terms of art that sometimes make a word mean something totally different to what you would infer if you heard it in general conversation.
2: Beancounterese: Spoken by accountantrs, salesmen and middle manglement. It sounds very similar to regulatorian but is sufficiently different in some of its meanings that it's as big a gulf as between old scots and english.
3: Geekian: The language of hard science, mathematics, real-world realities and the only one to use when specifying what a programmer needs to code. Because they will code what you tell them to, and it will work the way this language describes it.
The same word can mean different things in these three languages.
We have to be fluent in all three to accurately interpret requirements and predict what the emerging software will look like, to take error logs and demonstrate to (sometimes hostile) manglement what corrective action is needed and where it needs to be applied.
Re: Three languages
It gets worse, as there are quite a few Geekian dialects. I have learnt to speak a couple over the years, and know the word "morphology" can have radically different meanings, depending on whether you are talking to a medical doctor, an astronomer, or an image processing specialist. Great fun when you are in a project with different geeks each speaking their own dialect.
Used to have similar finger pointing with our network team. Network failed, so we'd drop them a ticket to investigate. "Replace your NIC" was the almost inevitable response. "No, pretty sure it's the switch port or cabling, like it's been the last 20 times we've had this fault". Nope, "Replace your NIC".
We got into the habit of switching cables to prove our point. I can't recall any occasion the fault didn't follow the cable/switch port.
Or speaking to the storage team - slow disks in the EMC arrays - "Everything is fine our end". Are you sure? "Yup, everything is fine". Well, we're seeing this and this... "Oh, we found a hot spot on the disk and have rebalanced"... *sigh*
Treating managers as toddlers
is always a good starting point.
I have been surprised, but those events are rare.
Re: Treating managers as toddlers
I originally read the heading as " Training managers as toddlers ". It made perfect sense to me given the state of some manglement.
When Sybase ran conferences their content was organised into three streams: technical, semi-technical and management.
In terms of descending complexity that would equate to 1, 3 and 53749
I've seen this so many times - everyone throwing blame at everyone else without first checking that you are standing on firm ground. It never ends well for someone.