Kubernetes kicks down Azure Front Door
- Reference: 1760017598
- News link: https://www.theregister.co.uk/2025/10/09/kubernetes_azure_outage/
- Source link:
The Windows giant noted problems from 0740 UTC, with multiple regions around the world reporting issues with its services.
"Our monitoring detected a significant capacity loss of about 30 percent of Azure Front Door instances, predominantly across Europe, Middle East, and Africa," Microsoft said.
[1]
Surely it hadn't rolled out a borked update yet again? Not this time, it seems. The company continued: "We understand that this is due to a dependency on some underlying Kubernetes instances that crashed. We have ruled out any deployments that could have triggered this event."
[2]
[3]
That said, losing almost a third of one's capacity due to crashing Kubernetes instances is less than ideal. A properly-architected solution should be able to recover from whatever woes befall the orchestrator, but apparently not Azure.
Social media was its usual supportive self. Some users were [4]attempting to cancel their Game Pass subscriptions following a recent price hike, but were unable to log in due to issues with the service. Others wanted to find out what was going on, but [5]encountered errors suggesting that connectivity had been lost.
[6]Subpoena tracking platform blames outage on AWS social engineering attack
[7]Texas man accidentally shoots cable, brings internet down
[8]Cloudflare DDoSed itself with React useEffect hook blunder
[9]Starlink outage knocks tens of thousands offline worldwide
In a move familiar to many administrators when faced with similar problems, Microsoft's engineers reached for whatever serves as the on/off switch for its Kubernetes implementation. "We have been restarting these underlying Kubernetes instances, and AFD instances are coming back online," the company said.
The service appears to be staggering back to its feet, and Microsoft has stated that the vast majority of impacted resources have been restored. The company later [10]posted that it kicked off a failover for the Microsoft 365 Portal service to speed things along, and that "we've validated that the service has fully recovered."
[11]
This will come as some comfort to users of Microsoft's online services who might have encountered issues today. The company has not, however, elaborated on why the instances went down in the first place or why the recovery was not automatic.
We will update this article if the Windows giant responds to our queries. ®
Get our [12]Tech Resources
[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/saas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aOfcFmXtXeMO1FOuk7AFeQAAABE&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/saas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aOfcFmXtXeMO1FOuk7AFeQAAABE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/saas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aOfcFmXtXeMO1FOuk7AFeQAAABE&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[4] https://x.com/TheMasterPrawn/status/1976218066783703271
[5] https://x.com/guyrleech/status/1976211370107601123
[6] https://www.theregister.com/2025/10/02/subpoena_tracking_platform_outage_blamed/
[7] https://www.theregister.com/2025/10/01/texas_internet_outage_gunshot/
[8] https://www.theregister.com/2025/09/18/cloudflare_ddosed_itself/
[9] https://www.theregister.com/2025/09/15/starlink_outage/
[10] https://x.com/MSFT365Status/status/1976251534863192210
[11] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/saas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aOfcFmXtXeMO1FOuk7AFeQAAABE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[12] https://whitepapers.theregister.com/
Yes, for a service design to give "higher availability, reduced latency, increased scalability", it's caused us many hours of grief today. We'll not know how much business we've lost EOD, but it's a real blow to MS' trust for us.
It didn't help that it took them 4 hours to even report it on the Azure Service Health page, leading us to think we had networking issues until I spoke to our MS handler, who immediately told us it was a known issue.
Trust???
… is that what you had in MicroSoft, ahhhh bless …. obviously you have not been working long enough with them yet!
Prepare for more on going disappointment… sorry for your loss
Bitten by Bitnami?
We've had a lot of fun dealing with Broadcom's changes at Bitnami causing problems in Kubernetes deployments.
Maybe Microsoft engineers didn't read the smallprint...
It wasn't just accessing Portals or Entra, any website behind a FrontDoor was suffering outages as well.
I have a lot of unhappy clients today.