Signal president Meredith Whittaker says they had no choice but to use AWS, and that's a problem
- Reference: 1761595646
- News link: https://www.theregister.co.uk/2025/10/27/signal_ceo_meredith_whittaker_aws_dependency/
- Source link:
Signal, like many other internet services, failed briefly during the [1]sizable AWS outage that occurred on October 19 and 20. The cause, as AWS explained in its [2]paragraph-starved post-mortem last week, was an error in AWS' automated DNS management system. And the loss of availability and productivity across the many AWS-dependent businesses has been estimated to have cost businesses [3]more than a hundred billion dollars .
AWS has about a third of the global market share for cloud computing services, [4]according to Synergy Research Group .
[5]
But a former AWS employee who corresponded with The Register argues the figure is more like half of the cloud computing market because AWS runs backend services for notional rivals like IBM, Oracle, and Salesforce. A recent [6]report from HG Insights puts the number of businesses using AWS at more than 4 million, with particular concentrations within media, retail, internet services, manufacturing, and education. Our insider tells us thousands of government agencies also depend on AWS, including some national security workloads.
[7]
[8]
Signal president Meredith Whittaker called attention to this massive dependency in [9]a thread on the Mastodon social network, explaining how the concentration of power among cloud hyperscalers limits the options of services like Signal in terms of resiliency and network control.
Whittaker said that the concentration of power among cloud hyperscalers (AWS, Google, and Microsoft) is less widely understood than she expected, which bodes poorly for efforts to craft realistic strategies to change this dynamic.
[10]
She explained, "The question isn't 'why does Signal use AWS?' It's to look at the infrastructural requirements of any global, real-time, mass comms platform and ask how it is that we got to a place where there's no realistic alternative to AWS and the other hyperscalers."
The technical challenges for a service like Signal, Whittaker said, involve running a low-latency platform for instant communications that can carry millions of concurrent audio and video calls. That requires infrastructure around the globe – computing, storage, and edge nodes. And that infrastructure must be powered, monitored, and repaired.
"Such infrastructure costs billions and billions of dollars to provision and maintain, and it's highly depreciable," said Whittaker. "In the case of the hyperscalers, the staggering cost is cross-subsidized by other businesses–themselves also massive platforms with significant lock-in."
[11]Australia sues Microsoft for misleading M365 users about Copilot subscription options
[12]The perfect AWS storm has blown over, but the climate is only getting worse
[13]EU sovereignty plan accused of helping US cloud giants
[14]UN Cybercrime Treaty wins dozens of signatories, to go with its many critics
The result is that most companies, Signal included, can't afford to replicate AWS' global network of data centers and computing power.
And even if Signal could afford to do so, she said, the talent to oversee global scale cloud computing is scarce.
[15]
"In short, the problem here is not that Signal 'chose' to run on AWS," said Whittaker. "The problem is the concentration of power in the infrastructure space that means there isn't really another choice: the entire stack, practically speaking, is owned by three to four players."
Whittaker said she hopes the recent AWS outage refocuses people's attention on the world's dependence on public cloud giants and encourages efforts to undo the concentration of power.
Europe, which has been thinking about [16]the problem of data sovereignty more seriously since the Trump administration took over in January, has found that it's easier to talk about avoiding US tech giants than it is to actually do so. For example, the official EU Cloud Sovereignty Framework has [17]come under fire from CISPE , a trade association of EU cloud providers, over concerns that the rules favor AWS, Microsoft Azure, and Google Cloud.
Plus, there's always the possibility that the Trump administration, in support of domestic economic advantage, could simply [18]turn off the internet in Europe – whether that involves DNS meddling or directives to US tech giants to withhold service – to secure consent for its demands.
The internet [19]is said – [20]though this is disputed – to have emerged from efforts to design a network that could survive nuclear war, a scenario that rather optimistically assumes the health of those operating the network. But it has already been captured by cloud capital expenditures. ®
Get our [21]Tech Resources
[1] https://www.theregister.com/2025/10/23/amazon_outage_postmortem/
[2] https://aws.amazon.com/message/101925/
[3] https://edition.cnn.com/business/live-news/amazon-tech-outage-10-20-25-intl
[4] https://www.srgresearch.com/articles/cloud-market-jumped-to-330-billion-in-2024-genai-is-now-driving-half-of-the-growth
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/paasiaas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aP_5ghC6JDRJmtF5MO-CUwAAABY&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[6] https://hginsights.com/blog/aws-market-report-buyer-landscape
[7] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/paasiaas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aP_5ghC6JDRJmtF5MO-CUwAAABY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[8] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/paasiaas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aP_5ghC6JDRJmtF5MO-CUwAAABY&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[9] https://mastodon.world/@Mer__edith/115445701583902092
[10] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/paasiaas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aP_5ghC6JDRJmtF5MO-CUwAAABY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[11] https://www.theregister.com/2025/10/27/asia_tech_news_roundup/
[12] https://www.theregister.com/2025/10/27/aws_outage_opinion/
[13] https://www.theregister.com/2025/10/27/cispe_eu_sovereignty_framework/
[14] https://www.theregister.com/2025/10/27/un_cybercrime_convention_signed/
[15] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_offprem/paasiaas&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aP_5ghC6JDRJmtF5MO-CUwAAABY&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[16] https://www.theregister.com/2025/02/26/europe_has_second_thoughts_about/
[17] https://www.theregister.com/2025/10/27/cispe_eu_sovereignty_framework/
[18] https://www.politico.eu/article/donald-trump-eu-internet-europe-us-trade-war-data-cyber/
[19] https://www.rand.org/pubs/articles/2018/paul-baran-and-the-origins-of-the-internet.html
[20] https://www.internetmythen.de/en/index0258.html
[21] https://whitepapers.theregister.com/
Re: Really?
"If they are profitable enough to pay the AWS costs to run their service, they are profitable enough to do it themselves"
This statement is simply wrong. As the Signal CEO notes, the organization runs a global operation, and it's almost certainly cheaper to piggyback on existing cloud infrastructure than to, for example, build out highly-available points of presence around the globe. Signal is also a non-profit organization, which presumably means they're not swimming in cash with which to build out and maintain that infrastructure. For many organizations, there can certainly be a point at which on-prem hosting becomes more affordable than paying a cloud provider, and your example of running your services "in a colo" is probably one of them, since you clearly don't need significantly distributed systems, but different organizations face different challenges.
Re: Really?
It looks like its an LLC?
https://en.wikipedia.org/wiki/Signal_Foundation#Signal_Messenger_LLC
And again, colo space is relatively cheap for what one gets. WAN connections are cheap compared to what they used to be. Internet bandwidth is cheap compared to what it used to be.
AWS has to do the same work as everyone else, but then they take a hunk in profit on top of it.
For them to say they have no other choice is just not true,
Re: Really?
From their Web site, Signal is a 501c3 nonprofit.
In any case, I suspect their CEO knows better what their challenges are than you do. Individual colo spaces, WAN connections, etc. may be inexpensive, but those costs, of course, mount as one needs more points of presence around the globe, and those costs, I suspect, are relatively small compared to the cost of deploying and supporting the operational technology needed to provide the actual service, which means network equipment, servers, storage, and the software needed to tie it all together. The value of global cloud providers is that they already have performed the heavy lifting in terms of deploying that stack; they've made the investment in the infrastructure, including software services, and are, in essence, renting it out to others. To correct your point, AWS has done the work that independent operators would have to do in order to provide similar capabilities.
So, to the question about whether Signal has a choice, the question is really whether they could have reached global coverage in a cost-effective manner without running on previously-built cloud infrastructure. The CEO's answer seems to be "no," but she points out that it's a deal with the Devil, one made by many organizations, which puts outsized power in the hands of the hyperscalers, in particular Amazon.
Again, because I suspect I know what the incoming comments will be, I'm not saying that public cloud is the right choice for every organization, but many companies and organizations find using cloud infrastructure highly beneficial for a variety of use cases.
Re: Really?
We had a lot of people that focused on resume driven development ... - Does this mean employees, presumably management with such decision making authority, directed services to be ported to AWS because it looks good on their resumes?
Re: Really?
Developers that decided that a service had to be in AWS because they convinced the PHB it would be better and cheaper there and never go down and it would look good on their resumes.
They built it with all sorts of SPF, documented nothing, got new jobs somewhere else and left operations people holding the bag with 5 digit monthly bills.
They would sign up for third party services with their own email address and a company credit card. The card would be near the expiration date, and the third party would send and email to the person no longer at the company to say it needed to be updated. No one would answer, and the third party would cut off the service, bringing the whole service down.
So how exactly did Skype circa 2008 (pre ebay, pre MS) manage to have a global messaging system that performed on par (or somewhat better) than Signal does today?
I think that Skype was peer 2 peer, the comms did no go through a central server
As I understand it, Skype still relied on a central server for discovery and connection/session initiation, and message buffering.
A P-P connection for the voice channel is what WebRTC gives us today (again, as I understand it)
So the system seems much the same.
Depends on their use case specifically
and their requirements etc, someone on LinkedIn mentioned to me last week something along the lines of "if you're going to build a CDN you have to use cloud", which is a line of BS, they thought pretty much all CDNs used public cloud (and I have no doubt many/most/all CDNs probably have some aspect of public cloud usage). Global, real time mass comms platform as CDNs are as well, all of the major ones and probably most/all of the minor ones use their own infrastructure for their edge. Not only for cost reasons but also (more important for them) routing/traffic control reasons (less important for an app like Signal).
You can see pretty easily whether or not a CDN node(the most important part of a CDN) is using a public cloud or their own stuff by just looking at it's IP, if the WHOIS info for the IP reveals a public cloud provider then that is clearly cloud, if it does not then most likely it is their own infrastructure.
I know for example when Snapchat went public, here on el reg there was an article (https://www.theregister.com/2017/02/03/snap_files_for_ipo/), where Snapchat said they had commited to spending $400 MILLION PER YEAR to Google for their cloud stuff. Sorry it's going to be hard to convince me that they can't build their own global network for a lot less than $400M per year... Snapchat is in a similar model as Signal I think ... ? (never having used Snapchat though I do use signal).
To me, one of the best (on paper) use cases for public cloud is you have to go from say ~100 CPU cores to 5,000 CPU cores for max of 2 hours per day (averaged over a month, so say max of 60 hours per month). Building infrastructure for ~60 hours of month of usage probably doesn't make sense (though I haven't run the numbers specifically). Another really good use case for public cloud is one off things, such as I think I have seen at least one article here on el reg about some group doing some kind of HPC test on cloud where they spun up a few thousand servers or something to do one test, then spun them down(never to be needed again). Obviously such situations are few and far between.
(Again on LinkedIn) there was a cluless tech leader dude from State Farm who wrote a dumb post saying everyone should use cloud, at their scale they want their business not to be focused on computers etc (typical outsourcing BS), anyway found it kind of ironic more recently another person posted about how Geico (same industry as State Farm, insurance) spent a decade moving into public cloud spending $300M/year, only to find out(why did it take a decade to find out?) that it costed them 2.5X more, and now they have reversed course.
But most anything with a real steady state load in 95%+ of use cases doesn't make sense to have on public cloud.
THAT SAID - if you are happy with overpaying your public cloud provider and don't care about the costs you are just a happy customer that is fine, continue to use them, just don't pretend that you are saving any money.
IaaS is broken by design, something I first wrote about 15 years ago and posted a link here on El reg, here is the link again
http://www.techopsguys.com/2010/10/06/amazon-ec2-not-your-fathers-enterprise-cloud/
Some back story to that, at the time the CEO of the company I was at was/is the sister of the head of Amazon cloud (now is the CEO of Amazon). I actually met with him and his chief scientist back in 2010 to complain about their bad service and he spent a bunch of time apologizing for it. But that's not the real story. The real story is even though I sent that link to my boss on that same day, he read it, and he thought it was a well thought out balanced post, someone over at Amazon got into a hissy fit and that came down on my employer(was before noon on the same day as I posted it) whom then gave me legal threats to take the post down(BS reasons), they threatened me again when I left the company (and triggered a mass exodus from the tech team, about a dozen came to the next company). I complied and hid the post for a few years, they eventually went out of business and I put it back up online about a decade ago.
I've started to think I will refer to these people (like that State Farm person above), as members of "Cult of the Cloud". (for whatever reason I came up with that sort of named similarly as "Cult of the Dead Cow"), where they can be faced with so many different facts and figures and they are so brainwashed that they just can't believe their eyes/ears (similar to "MAGA" folks). Same sort of thing applies to so many folks pushing Kubernetes as well(and "IaC" to a lesser extent). All complicated coping means to try to tame "the cloud". Make it simpler, don't use it. (I happily admit there are use cases for all of these things they just don't apply to everyone(don't apply to most really), and many of these folks think these things should apply to everyone).
The post is still valid today, as the flawed design of IaaS remains unchanged.
I moved my last org out of AWS in early 2012 with a 7 month ROI, and followed with a decade of flawless operation.
Re: Depends on their use case specifically
To me, one of the best (on paper) use cases for public cloud is you have to go from say ~100 CPU cores to 5,000 CPU cores for max of 2 hours per day ...
In the days of new internet startups startups, in particular social media, waiting for that lucky viral moment when usage would suddenly explode was key to getting an IPO, and missing such an opportunity was like missing the boat. That whole industry though doesn't produce anything worthwhile except crappy LLM training data, and negative cultural impact of viral+shallow is horrifying.
Re: Depends on their use case specifically
Viral moments should be cached by CDN. I worked for 2 social media startups in 2006-2008 and 2010-2011 (both in Seattle). The latter one used AWS (when I wrote that blog post). Their bill at times was in excess of $500,000/mo(I have always suspected due to the relationships they likely did not pay the full dollar value on their bills, but I have no proof either way). Not because they had tons of users, but because things were in such a chaotic state and high turnover.They did have bursts of traffic but in the grand scheme of things it was not a lot of traffic. I had a plan with a 6 and a half month ROI for bringing stuff in house. I didn't like the company much so I spent WAY TOO MUCH time on that presentation and research and stuff(I enjoyed it). (the executive slideshow was only 15 pages including a few pages with mostly images, the full technical slide show covering every aspect of things was a full 170 pages)
Everyone in the company was on board from my manager, to the CTO, the CEO, the software developers, everyone. The board shot the plan down and wanted to re-evaluate in a year or so. I left within a week of that. My manager resigned the day after I left, and a bunch more left soon after. My hiring manager at THAT company hired me at the next company where I spent over 10 years(that manager left after 2-3 years).
I know AWS' support is better now, but an example from the time, my (then) new manager had a decade of experience working at Amazon(we had many ex-Amazon employees including our CTO). Our CEO was the sister of the head of AWS. We were in the same city as AWS. My manager reached out and in a kind way said basically "everyone at my company hates your product, non stop problems. We must be doing something wrong, can you come on site and talk to us about what is going on? Knowing we spend a lot of $$ in your cloud and we have a lot of relationships with your leadership". Their answer ? (something along the lines of) "Tough shit, that's not our model, you figure it out". Even my manager was floored at the response. An earlier company I was at Oracle flew people on site on one occasion to deal with problems(for multiple days) we were having and we were spending a FRACTION on Oracle DB as my social media company was spending on AWS. My (then) manager later went to work for Oracle cloud for a few years till he retired(he tried to hire me several times), another person on my team at that social media company still works for Oracle cloud as a tech architect of some kind (very smart guy, I didn't know him well)
Re: Depends on their use case specifically
memory triggered ... I remember one time the tech leadership of that social media company were freaking out claiming someone was attacking our site, and our site was crashing. It was crashing, they were hitting some "special" API endpoint I don't recall the details other than it was something like not even 3 requests per second. It was a joke, what a terrible code base (made in part again due to high turnover, stress, death marches etc).
I also recall a couple of years after I left I happened to be in Seattle again visiting folks, I got a call early in the morning on my cell phone. Someone was trying to get in touch with someone at the company but they could not find contact info. Website had nothing, and I guess they weren't trying very hard because they came to me, apparently my contact info was on their domain still even though I left the company a long time ago. It seemed kind of strange... then he eventually came clean saying "I don't want to alarm you, but I am calling from the FBI". Oh, wow, ok. I never learned as to the cause of them wanting to contact the company(it was legit as far as I know). This caller was in search of log events for something... I was able to contact the company and get him in touch with them. I sort of joked with the company saying "Hey your splunk instance is on the internet you can just give him a login to it". The app stack did support "user generated content" forums, and other things, so I imagine some users posted some illegal content of some kind and that triggered the response. Nobody told me what the end result was beyond they were successfully in contact with the FBI.
Internet damage mitigation
The Internet, as in TCP/IP style packet switching, continues perfectly well if AWS goes down. It is the higher levels of services that depend upon AWS computing that go TITSUP when there is a problem.
Really?
I know of quite a few largish companies in the colo facilities that we use that do a whole lot without using any of the hyperscalers, including ourselves.
We had a lot of people that focused on resume driven development over the last 10 years and put a whole bunch of critical services we use in AWS, but we have been slowly bring them back to a colo running on our gear.
If they are profitable enough to pay the AWS costs to run their service, they are profitable enough to do it themselves, they either lack the imagination or don't want to.