Amazon Web Services’ US-EAST-1 region in trouble again, with EC2 and container services impacted
- Reference: 1761700602
- News link: https://www.theregister.co.uk/2025/10/29/aws_us_east_1_more_problems/
- Source link:
At 3:36 PM PDT on October 28 (10:36PM UTC), the cloud colossus [2]advised customers that “Earlier today some EC2 launches within the use1-az2 Availability Zone (AZ) experienced increased latencies for EC2 instance launches.”
US-EAST-1 region is not the cloudy crock it's made out to be, claims AWS EC2 boss [3]READ MORE
Amazon throttled some requests for EC2 resources, but said retrying a request should resolve the issue.
Another impact of the incident created “task launch failure rates for [Elastic Container Service] ECS tasks for both EC2 and Fargate for a subset of customers in the US-EAST-1 Region.”
Amazon’s status page advises that ECS operates cells in the US-EAST-1 Region “and a small number of these cells are currently experiencing elevated error rates launching new tasks and existing tasks may stop unexpectedly.” The cloudy concern also warned that customers “may also see their container instances disconnect from ECS which can cause tasks to stop in some circumstances” but advised it had identified the problem and was working to fix it.
[4]
The incident also impacted EMR Serverless services – the elastic map reduce service Amazon offers to run big data tools like Hadoop and Spark.
Here we go again
At 5:31 PM PDT AWS updated its advice to reveal that “EMR Serverless maintains a warm pool of ECS clusters to support customer requests, and some of these clusters are operating in the impacted ECS cells.”
Amazon said it was “actively working on refreshing these warm pools with healthy clusters” and that it had made progress “on recovering impacted ECS cells, but progress is not visible externally.”
[5]
[6]
“ECS has stopped new launches and tasks on the affected clusters. Some services (such as Glue) are observing recovery for error rates, but may still be experiencing increased latency,” AWS’s status page advises, before stating a “current best estimate of an ETA is 2-3 hours away.”
[7]Major AWS outage across US-East region breaks half the internet
[8]Cloudflare Q3 report shows the internet still breaks for the strangest reasons
[9]The perfect AWS storm has blown over, but the climate is only getting worse
[10]With impeccable timing, AWS debuts automated cloud incident report generator
AWS has not posted any info about the cause of the incident, but whatever caused it is bad news, as the reason for last week’s incident was that many AWS services relied on the operation of another – the DynamoDB database.
In this incident, the problems with EMR serverless are related to issues with ECS – again showing that internal dependencies make the Amazonian cloud fragile.
AWS lists ten services impacted by this incident – App Runner, Batch, CodeBuild, Fargate, Glue, EMR Serverless, EC2, ECS and the Elastic Kubernetes Service – but at the time of writing The Register isn’t seeing reports of service disruptions. That may be because US-EAST-1 is home to six availability zones, meaning plenty of AWS resources remain available if customers have chosen to use them. This incident is also not a full outage, meaning AWS may well have spare resources in the impacted availability zone that customers can use instead of the broken or throttled bits. ®
Get our [11]Tech Resources
[1] https://www.theregister.com/2025/10/20/amazon_aws_outage/
[2] https://health.aws.amazon.com/health/status
[3] https://www.theregister.com/2024/04/10/aws_dave_brown_ec2_futures/
[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aQGfaGYIAFxNL3WXkgcDwgAAAZE&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aQGfaGYIAFxNL3WXkgcDwgAAAZE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/front&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aQGfaGYIAFxNL3WXkgcDwgAAAZE&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[7] https://www.theregister.com/2025/10/20/amazon_aws_outage/
[8] https://www.theregister.com/2025/10/28/cloudflare_q3_internet_disruption/
[9] https://www.theregister.com/2025/10/27/aws_outage_opinion/
[10] https://www.theregister.com/2025/10/23/aws_cloudwatch_automated_incident_reports/
[11] https://whitepapers.theregister.com/
Maybe they should run their stuff in Azure!
"The Cloud" is just somebody else's computer that you pay an outrageous set of fees to use.
But "The Cloud" computer is managed by human beings in the end, and controlled by largely human-written software (which typically puts so called "vibe" code to shame for quality and reliability in reality), and mistakes will happen .
But when "The Cloud" makes a mistake, it makes it for thousands and tens of thousands of systems. Not just one corporate server cluster.
After seeing my bill on trial runs of cloud services, I rapidly realized I could buy self-hosting hardware for about 4-5 months of cloud services if I had a real workload. I didn't buy self-hosting hardware, but it convinced me that self-hosting or co-location typically put "The Cloud" to shame for price, even when you factored in some decent number of sysadmins (you need sysadmins for the hosts you run, not the Amazon EC service configurations, but that's "six of one; half a dozen of the other.")
Prompt
Let me guess. Prompt was missing the "Ensure this time it works." phrase at the end.