News: 0179845686

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Amazon's DNS Problem Knocked Out Half the Web, Likely Costing Billions

(Tuesday October 21, 2025 @05:45PM (BeauHD) from the here-we-go-again dept.)


An anonymous reader quotes a report from Ars Technica:

> On Monday afternoon, Amazon [1]confirmed that an outage affecting Amazon Web Services' cloud hosting, which had [2]impacted millions across the Internet , had been resolved. Considered the worst outage since [3]last year's CrowdStrike chaos , Amazon's outage caused "global turmoil," Reuters [4]reported . AWS is the world's largest cloud provider and, therefore, the "backbone of much of the Internet," ZDNet [5]noted . Ultimately, more than 28 AWS services were disrupted, [6]causing perhaps billions in damages , one analyst [7]estimated for CNN.

>

> [...] Amazon's problems originated at a US site that is its "oldest and largest for web services" and often "the default region for many AWS services," Reuters noted. The same site has experienced two outages before in 2020 and 2021, but while the tech giant had confirmed that those prior issues had been "fully mitigated," apparently the fixes did not ensure stability into 2025. ZDNet noted that Amazon's first sign of the outage was "increased error rates and latency across numerous key services" tied to its cloud database technology. Although "engineers later identified a Domain Name System (DNS) resolution problem" as the root of these issues and quickly fixed it, "other AWS services began to fail in its wake, leaving the platform still impaired" as more than two dozen AWS services shut down. At the peak of the outage on Monday, Down Detector tracked more than 8 million reports globally from users panicked by the outage, ZDNet reported.

Ken Birman, a computer science professor at Cornell University, told Reuters that "software developers need to build better fault tolerance."

"When people cut costs and cut corners to try to get an application up, and then forget that they skipped that last step and didn't really protect against an outage, those companies are the ones who really ought to be scrutinized later."



[1] https://health.aws.amazon.com/health/status

[2] https://tech.slashdot.org/story/25/10/20/140248/aws-outage-takes-thousands-of-websites-offline-for-three-hours

[3] https://it.slashdot.org/story/24/07/19/0943232/global-it-outage-linked-to-crowdstrike-update-disrupts-businesses?sdsrc=rel

[4] https://www.reuters.com/business/retail-consumer/amazons-cloud-unit-reports-outage-several-websites-down-2025-10-20/

[5] https://www.zdnet.com/home-and-office/networking/the-massive-aws-outage-that-broke-half-the-internet-is-finally-over-heres-what-happened/

[6] https://arstechnica.com/tech-policy/2025/10/amazons-dns-problem-knocked-out-half-the-web-likely-costing-billions/

[7] https://www.cnn.com/business/live-news/amazon-tech-outage-10-20-25-intl



but, but, but (Score:5, Insightful)

by flippy ( 62353 )

it's a GOOD thing that one company controls that much of the internet, right? I mean, super efficient.

Kin Birman is an idiot. (Score:3)

by Brain-Fu ( 1274756 )

Or maybe he was quoted out of context.

When you use AWS to host your businesses website, and/or all the data that your business processes, and/or whatever back-end web-facing APIs your business uses, no amount of "fault tolerance" is going to keep you afloat when AWS goes down.

If we want to blame the victim, the correct accusation is: "you shouldn't outsource your critical business infrastructure to a huge megacorp that can survive without you."

Re: (Score:2)

by suutar ( 1860506 )

I think in this case since it was only one region we can fall back to "you shouldn't outsource anything significantly important without the multi-region failover plan"

Re: (Score:2)

by anoncoward69 ( 6496862 )

This is why you don't host your critical service / product / infrastructure in a single AZ

Re: (Score:2)

by anoncoward69 ( 6496862 )

And if it's really really super critical you should probably have it hosted in more than one cloud provider as well. Cause multiple AZ isn't going to save your ass if there's some kind of billing / accounting mishap at AWS and your cloud account goes poof.

Re: (Score:2)

by Conchobair ( 1648793 )

East went down. West was up the whole time. Any company not run by idiots it in the contract that they could easily switch to their West instance when East goes down.

Re: (Score:2)

by alvinrod ( 889928 )

For large failures that won't save you. Does Amazon have enough infrastructure to run all of the East instances on their West hardware? That's doubtful and if they tried it would degrade performance if not outright take down the West due to the load.

Having someone to pick up the slack is only possible if there's excess infrastructure in place to handle it. If there were dozens of smaller players this isn't a problem, but if there are only two or three major providers, none of them will overprovision enou

Re: (Score:1)

by linuxuser3 ( 3973525 )

Only one problem with that idea, my systems were up/online all day yesterday, connecting flawlessly, and I didn't even know AWS was down or having problems till late afternoon when I had a chance to browse the news. As in all things, diversification is the best disaster protection. When the DNS root servers are all hosted on AWS, then we're in trouble. But Cloudflare, Control-D, thru DoT, my systems connectivity was fine yesterday and my Tor relay node was up online all day long. AWS went down yesterday? Re

It was just doing what clouds do... (Score:4, Insightful)

by AmazingRuss ( 555076 )

... suddenly disappearing with a 'poof'.

Real problem - half a web belongs to Amazon (Score:3)

by sinij ( 911942 )

Just like in that saying goes, if you owe a bank a million they own you, if you owe a bank a billion you own them.

What was actually damaged/destroyed (Score:2, Interesting)

by Anonymous Coward

A website I use for work was down, but I just worked on other stuff and then later when it was back up I did the stuff I would have earlier. Nothing of value was lost. And I don't mean the website that was down isn't valuable, it's important to a lot of my work, but it's back up and the dollar value of the downtime to me or to my employer is basically zero. If we had to replace it entirely, the cost would be substantial, but just not being able to use it for a few hours or an entire day costs nothing.

I have

Re:What was actually damaged/destroyed (Score:4, Interesting)

by nightflameauto ( 6607976 )

> A website I use for work was down, but I just worked on other stuff and then later when it was back up I did the stuff I would have earlier. Nothing of value was lost. And I don't mean the website that was down isn't valuable, it's important to a lot of my work, but it's back up and the dollar value of the downtime to me or to my employer is basically zero. If we had to replace it entirely, the cost would be substantial, but just not being able to use it for a few hours or an entire day costs nothing.

> I have a hard time believing anything worth billions was destroyed. Maybe some purchases got delayed. Maybe some things got bought from a different source. Maybe some people worked on restoring service instead of what they would have been working on, but then got around to doing the work that they would have done.

> Maybe some people had an easy day of sitting around accomplishing less than they would have but then followed that with a busy day of working faster than usual to catch up.

> Anybody have any anecdotes about things that actually got damaged or destroyed that could possibly account for a claim of "billions"?

You're looking at this logically. Stop that.

I'm sure that what they've done here is calculated how much revenue is generated via the hosted services per hour, multiplied it by the downtime, and just shoved that number out as the number of dollars lost. As you say, it won't actually be that high, but everything now has to be about profit gained or lost. And there's really only one way to even begin to get people to have a conversation about whether throwing all your digital eggs in one basket is a good idea is to scare the shit ouf of them by showing them potential lost profits.

I don't personally love that this is the timeline we're in, but the scare should be real here. And big decision makers need big numbers flashing in their face or they won't even think about changing their methodology.

Re: (Score:2)

by irchans ( 527097 )

> You're looking at this logically. Stop that.

LOL :)

Re: What was actually damaged/destroyed (Score:1)

by PoopMelon ( 10494390 )

money not earned is money lost. imagine you run a bakery - you pay for the emlployees and other operation costs of that store but the doors to it won't open because it broke for 2 days. this means you have lost 2 days worth of profit

Too many points of failure (Score:2)

by xack ( 5304745 )

Think about how much you rely on your dns server, your dhcp lease, your clock being the right time to validate certificates, having the right combination of browser so you are considered "human" and not a bot. In the old days of the internet you just dialed up and fetched simple static html pages, now we have vibecoded contraptions with huge dependency graphs. I still think it will be inevitable Microsoft will corrupt the secure boot process somehow and render all Windows PCs unbootable as the ultimate scre

House of Cards. (Score:4, Insightful)

by Fly Swatter ( 30498 )

Between AWS, Cloudflare, Google, Microsoft, and whoever else. There are too many single points of failure. But then that is the modern design philosophy for any modern infrastructure or manufacturing project.

All about the money (Score:5, Insightful)

by YuppieScum ( 1096 )

> Ken Birman, a computer science professor at Cornell University, told Reuters that "software developers need to build better fault tolerance."

Ken is missing the point: management need to budget for better fault tolerance, then the developers can build it.

> "When management cut costs and cut corners to try to get an application up, and then don't care that they skipped that last step and didn't really protect against an outage, those managers are the ones who really ought to be sacked later."

FTFY, Ken.

Re: (Score:2)

by Kernel Kurtz ( 182424 )

> Ken is missing the point: management need to budget for better fault tolerance, then the developers can build it.

When I worked in the enterprise my experience was such that fault tolerance was more often deficient due to inadequate hardware budgets than to lacking software capabilities, but I know nothing about this particular event.

Re: All about the money (Score:2)

by topham ( 32406 )

It's almost rudimentary today to have fault tolerance in the design. Horizontal scaling automatically gains some level of fault tolerance unless you specifically build it without.

Which is usually budget constraint, not a developer constraint.

My sites are on AWS, and they all stayed up yesterday. But they aren't multi-region. That's the risk we take.

Re: (Score:2)

by YuppieScum ( 1096 )

> ...fault tolerance was more often deficient due to inadequate hardware budgets than to lacking software capabilities...

That's fair as far as it goes, but adding hardware doesn't magically make fault tolerance happen, unless your platform is Tandem's NonStop.

Ultimately, unless there is sufficient money to pay for the software development, hardware suite (dev, test and prod), testing and ongoing maintenance of a fault-tolerant system, then you get a system that stops working if, or rather when , there are faults.

The Great Oops will eventually happen (Score:4, Insightful)

by ebunga ( 95613 )

At some point AWS is going to delete all customer data, or otherwise cause it to be unrecoverable. AWS is too large and complicated for it to not suffer a catastrophic failure at this scale. It's inevitable.

OMG!!! The Intertubez are unstable!!!... GAH!!! (Score:1)

by NoOnesMessiah ( 442788 )

Sweet mother of 110 Baud Modems! Yes, people, sh*t breaks. Welcome to my world since about 1984. While we've gotten better at it, it's still not fool-proof, and there are a LOT of fools out there. And you can claim that "billions were lost" but those are also ephemeral billions that wouldn't even exist today without the Internet and the greedy b*st*rds that have taken extreme advantage of it. News flash: Cloudflare is important too, and they've had outages. Facebook has had some super-entertaining, ex

Business as Usual (Score:2)

by RossCWilliams ( 5513152 )

This is typical of the tech world in general. There will be no real consequences for Amazon compared to the costs.The billions in damages are going to be left to others to pick up the costs.

MEMORIES OF MY FAMILY MEETINGS still are a source of strength to me. I
remember we'd all get into the car -- I forget what kind it was -- and
drive and drive.

I'm not sure where we'd go, but I think there were some bees there. The
smell of something was strong in the air as we played whatever sport we
played. I remember a bigger, older guy whom we called "Dad." We'd eat
some stuff or not and then I think we went home.

I guess some things never leave you.
-- Jack Handey, The New Mexican, 1988.