Cloudflare fesses up to config change that borked internet access for all
- Reference: 1752681554
- News link: https://www.theregister.co.uk/2025/07/16/cloudflare_fesses_up_to_config/
- Source link:
In a blog post, the content delivery network services biz [1]detailed the unfortunate series of events that led to Monday's disruption.
On the day itself, "Cloudflare's 1.1.1.1 Resolver service became unavailable to the internet starting at 21:52 UTC and ending at 22:54 UTC. The majority of 1.1.1.1 users globally were affected. For many users, not being able to resolve names using the 1.1.1.1 Resolver meant that basically all Internet services were unavailable," Cloudflare said.
[2]
But the problem originated much earlier.
[3]
[4]
The outage was caused by a "misconfiguration of legacy systems" which are used to uphold the infrastructure advertising Cloudflare's IP addresses to the internet.
"The root cause was an internal configuration error and not the result of an attack or a BGP hijack," the corp said.
[5]
Back on June 6 this year, as Cloudflare was preparing a service topology for a future Data Localization Suite (DLS) service, it introduced the config gremlin - prefixes connected to the 1.1.1.1 public DNS Resolver were "inadvertently included alongside the prefixes that were intended for the new DLS service."
"This configuration error sat dormant in the production network as the new DLS service was not yet in use, but it set the stage for the outage on July 14. Since there was no immediate change to the production network there was no end-user impact, and because there was no impact, no alerts were fired."
On July 14, a second tweak to the service was made: Cloudflare added an offline datacenter location to the service topology for the pre-production DNS service in order "to allow for some internal testing." But the change triggered a refresh of the global configuration of the associated routes, "and it was at this point that the impact from the earlier configuration error was felt."
[6]
Things went awry at 2148 UTC.
"Due to the earlier configuration error linking the 1.1.1.1 Resolver's IP addresses to our non-production service, those 1.1.1.1 IPs were inadvertently included when we changed how the non-production service was set up… The 1.1.1.1 Resolver prefixes started to be withdrawn from production Cloudflare datacenters globally."
[7]Massive spike in use of .es domains for phishing abuse
[8]Cloudflare creates AI crawler tollbooth to pay publishers
[9]Ingram Micro restarts orders – for some – following ransomware attack
[10]Uncle Sam wants you – to use memory-safe programming languages
Traffic began to drop four minutes later and internal health alerts started to emerged. An "incident" was declared at 2201 UTC and a fix dispatched at 2220 to restore the previous configuration.
"To accelerate full restoration of service, a manually triggered action is validated in testing locations before being executed," Cloudflare said in its explanation of the outage. Revolver alerts were cleared by 2254 UTC and DNS traffic on Resolver prefixes went back to typical levels, it added.
Data on DNSPerf shared with us by a reader indicates a length of the disruption of around three hours, far longer than Cloudflare's summary suggests.
As a Reg reader pointed out: "Remember this is a DNS service. Every person using the service would have had no ability to use the internet. Every business using Cloudflare had no internet for the length of the outage. NO DNS = NO INTERNET." ®
Get our [11]Tech Resources
[1] https://blog.cloudflare.com/cloudflare-1-1-1-1-incident-on-july-14-2025/
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aHgg6AeCizwexkX1_sEFXAAAAQQ&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aHgg6AeCizwexkX1_sEFXAAAAQQ&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aHgg6AeCizwexkX1_sEFXAAAAQQ&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aHgg6AeCizwexkX1_sEFXAAAAQQ&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_onprem/networks&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aHgg6AeCizwexkX1_sEFXAAAAQQ&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[7] https://www.theregister.com/2025/07/05/spain_domains_phishing/
[8] https://www.theregister.com/2025/07/01/cloudflare_creates_ai_crawler_toll/
[9] https://www.theregister.com/2025/07/09/ingram_micro_restarts_orders_for/
[10] https://www.theregister.com/2025/06/27/cisa_nsa_call_formemory_safe_languages/
[11] https://whitepapers.theregister.com/
It's not DNS
There's no way it's DNS
It was DNS
(credit: http://i.imgur.com/eAwdKEC.png)
It's usually DNS.
DNS issues are so incredibly common that they're among the first things I check when I hear somebody whine "The WiFi is out".
And yes, that's the most common way I get an internet issue report. I don't even bother checking the WiFi first any more, because it's rarely the WiFi.
Rule 1. It's always DNS
Rule 2. If it isn't DNS it's a certificate
They are one of my forwarders...
They are one of my forwarders but not the only one, and I use PiHole so didn't notice.
Everyone who cares about staying connected should run a caching DNS server with multiple diverse forwarders.
Re: They are one of my forwarders...
Or maybe you could subscribe to an internet service provider that paid IT network professionals to do that shit for you?
Being a "prepper" isn't a good use of my time or specialist skills.
Re: They are one of my forwarders...
@tfewster
You mean put a secondary DNS entry in? It's not hard. If you don't know how to that, go use Mumsnet or something.
Re: They are one of my forwarders...
Depends on the issue. If the primary is down, then the secondary will work, although will result in slower resolution as the primary times out.
The real problem is if the primary is up but providing invalid data. If it comes back with NXdomain you will never query the secondary.
Sometimes it is simpler just to have 1 resolver, then manually cut over to a secondary when you have established there is a problem.
Screwed Up Cloudflare !!
This might be related, or it might not.
But Cloudflare hosts a few UK concert ticket sellers and about 2 weeks ago, one of these resellers was doing a great deal on tickets for a band I like.
So, I visited the relevant website and tried to buy tickets...but I got a Cloudflare error message that said that my IP address belonged to a scammer and it refused my custom.
And there was no easy way to report this to either the ticket seller or Cloudflare tech support via this error page.
If a Cloudflare "upgrade" or change to their hosted system was ongoing then this might explain my issue?
Re: Screwed Up Cloudflare !!
Assuming you're on a residential ISP, and you don't have a static address, all it means is that your dad, or one of your neighbors, has been up to no good, and now you have the address that they were using.
Screwed up but confessed
We all make mistakes. The impact of some people's mistakes are bigger than for others.
At least Cloudflare admitted it and explained how they got it wrong. That earns some forgiveness (from me at least), unlike those who try to blame someone else.
That's one way to promote rapid resolution
"Revolver alerts were cleared by 2254 UTC"
Would that be a 38special alert or a 357magnum alert?
@ElReg
"Every business using Cloudflare had no internet for the length of the outage. NO DNS = NO INTERNET."
This is the 1.1.1.1 service, any one using say Google's DNS would see a non-issue.
As a the reg is a Cloudflare user, did your site go down? No.
If a business does not have a secondary DNS set up then they really should be looking at their IT team and going WTF?
I have 2 PCs. One is manually configured for 1.1.1.1. As soon as I realised it looked like there were DNS issues I tested 8.8.8.8, which worked
The other machine is configured for DHCP and used the router for DNS. This was configured for the ISP DNS servers. They must be using 1.1.1.1 for upstream DNS as that stopped working at the same time. Again, switching to 8.8.8.8 was a temporary fix.
why not add both? or use 1.1.1.1, 8.8.8.8 and 9.9.9.9, then cloudflare, google, and IBM would all have to go offline before dns stops working for you.
Because I have had problems in the past where the primary is still up but providing bogus info. Takes longer to troubleshoot than just knowing there is something up with DNS and cutting over to a different provider. Plus if you are waiting for down servers to time out DNS resolution is slower.
If they're idiots
Seriously, El Reg?
"As a Reg reader pointed out: 'Remember this is a DNS service. Every person using the service would have had no ability to use the internet. Every business using Cloudflare had no internet for the length of the outage. NO DNS = NO INTERNET.'"
Only if they're idiots. Nobody who isn't stupid relies on a single DNS provider.