Kubernetes kicks down Azure Front Door

(2025/10/09)

Reference: 1760017598
News link: https://www.theregister.co.uk/2025/10/09/kubernetes_azure_outage/
Source link:

If you struggled to access the Azure Portal or Microsoft Entra this morning, you weren't alone – Microsoft has blamed a Kubernetes crash for the outage.

The Windows giant noted problems from 0740 UTC, with multiple regions around the world reporting issues with its services.

"Our monitoring detected a significant capacity loss of about 30 percent of Azure Front Door instances, predominantly across Europe, Middle East, and Africa," Microsoft said.

[1]

Surely it hadn't rolled out a borked update yet again? Not this time, it seems. The company continued: "We understand that this is due to a dependency on some underlying Kubernetes instances that crashed. We have ruled out any deployments that could have triggered this event."

[2]

[3]

That said, losing almost a third of one's capacity due to crashing Kubernetes instances is less than ideal. A properly-architected solution should be able to recover from whatever woes befall the orchestrator, but apparently not Azure.

Social media was its usual supportive self. Some users were [4]attempting to cancel their Game Pass subscriptions following a recent price hike, but were unable to log in due to issues with the service. Others wanted to find out what was going on, but [5]encountered errors suggesting that connectivity had been lost.

[6]Subpoena tracking platform blames outage on AWS social engineering attack

[7]Texas man accidentally shoots cable, brings internet down

[8]Cloudflare DDoSed itself with React useEffect hook blunder

[9]Starlink outage knocks tens of thousands offline worldwide

In a move familiar to many administrators when faced with similar problems, Microsoft's engineers reached for whatever serves as the on/off switch for its Kubernetes implementation. "We have been restarting these underlying Kubernetes instances, and AFD instances are coming back online," the company said.

The service appears to be staggering back to its feet, and Microsoft has stated that the vast majority of impacted resources have been restored. The company later [10]posted that it kicked off a failover for the Microsoft 365 Portal service to speed things along, and that "we've validated that the service has fully recovered."

[11]

This will come as some comfort to users of Microsoft's online services who might have encountered issues today. The company has not, however, elaborated on why the instances went down in the first place or why the recovery was not automatic.

We will update this article if the Windows giant responds to our queries. ®

Get our [12]Tech Resources

[1] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/saas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aOfcFmXtXeMO1FOuk7AFeQAAABE&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/saas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aOfcFmXtXeMO1FOuk7AFeQAAABE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/saas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aOfcFmXtXeMO1FOuk7AFeQAAABE&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[4] https://x.com/TheMasterPrawn/status/1976218066783703271

[5] https://x.com/guyrleech/status/1976211370107601123

[6] https://www.theregister.com/2025/10/02/subpoena_tracking_platform_outage_blamed/

[7] https://www.theregister.com/2025/10/01/texas_internet_outage_gunshot/

[8] https://www.theregister.com/2025/09/18/cloudflare_ddosed_itself/

[9] https://www.theregister.com/2025/09/15/starlink_outage/

[10] https://x.com/MSFT365Status/status/1976251534863192210

[11] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/saas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aOfcFmXtXeMO1FOuk7AFeQAAABE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[12] https://whitepapers.theregister.com/

Alister

It wasn't just accessing Portals or Entra, any website behind a FrontDoor was suffering outages as well.

I have a lot of unhappy clients today.

IFYates

Yes, for a service design to give "higher availability, reduced latency, increased scalability", it's caused us many hours of grief today. We'll not know how much business we've lost EOD, but it's a real blow to MS' trust for us.

It didn't help that it took them 4 hours to even report it on the Azure Service Health page, leading us to think we had networking issues until I spoke to our MS handler, who immediately told us it was a known issue.

Trust???

Jibberboy2000

… is that what you had in MicroSoft, ahhhh bless …. obviously you have not been working long enough with them yet!

Prepare for more on going disappointment… sorry for your loss

Bitten by Bitnami?

Essuu

We've had a lot of fun dealing with Broadcom's changes at Bitnami causing problems in Kubernetes deployments.

Maybe Microsoft engineers didn't read the smallprint...

News: 1760017598

Kubernetes kicks down Azure Front Door

Trust???

Bitten by Bitnami?