AWS Lambda loves charging for idle time: Vercel claims it found a way to dodge the bill
- Reference: 1753957748
- News link: https://www.theregister.co.uk/2025/07/31/aws_lambda_cost_nightmare/
- Source link:
For the uninitiated, AWS Lambda is Amazon's serverless compute platform handy for short bursts of work, but costly for long-running or latency-prone tasks. Each request runs in its own environment and gets billed for the full duration, even when idle. At a small scale, the idle-time burn might be negligible, but at billions of invocations, it adds up fast.
The AWS Lambda design is that "for each concurrent request, Lambda provisions a separate instance of your execution environment," according to the [1]cloud giant . Pricing is based on the number of function requests, the duration of each request, and the memory allocated to the function, where memory is between 128MB and 10,240 MB. No function can run for longer than 15 minutes. There is an [2]open-source tool that measures execution time and cost for a function in order to optimize Lambda configuration.
[3]
This approach works well for functions that do all their processing on the Lambda instance, but it is wasteful if they spend a lot of time waiting for remote services to complete. Tom Lienard, Vercel software engineer, has [4]posted about how the company found a solution, apparently by accident. Vercel is the home of Next.js, a React-based framework that is also recommended by the React team as the best implementation of React Server Components (RSC).
[5]
[6]
The technology requires streaming UI data to the browser, and when Vercel started working on this in 2020, AWS Lambda (which Vercel uses extensively to implement functions on its hosting platform) did not support streaming, so the team worked on implementing a TCP-based protocol to create a tunnel between Vercel and AWS Lambda functions. The data to be streamed comes back over this tunnel, and a Vercel Function Router converts this to a stream that is returned to the client.
Then, "we had a thought," said Lienard. Since the tunnel now exists, "what if we could send an additional HTTP request for a Lambda to process?" – something which Lambda's design does not normally allow.
Diagram showing how a second concurrent request is sent to an existing Lambda instance
This was not simple to implement since the system has to track the current CPU and memory usage of each AWS Lambda instance, and its 15-minute lifetime, as well as adding metric tracking to a Rust-based core running on the instance so that it can refuse requests if necessary. Lienard's post has more details, but the outcome formed the basis of what Vercel calls [7]Fluid Compute where existing resources are used before scaling new ones, and billing is based on actual compute usage. Lienard claimed savings of "up to 95 percent on compute costs."
Vercel, we also note, charges more than AWS for function usage. At the time of writing, AWS Lambda (Arm architecture) [8]costs $0.20 per million requests, plus from $0.048 per GB/hour of instance usage. Vercel, prior to Fluid Compute, [9]charges $0.60 per million requests and $0.18 per GB/hour.
Vercel bill shock caused by functions with slow-returning AI calls
A Vercel customer this week [10]complained on X that "a few weeks ago, I moved ~20 client-side requests to server side (on 1 page). Now my Vercel bill jumped from $300 a month to $3,550 this month, where 99% of it comes from 'serverless function duration.'" The functions, he said, were "GET request to the DB and sending claude/anthropic requests" – exactly the kind of function which is subject to the idle time problem mentioned above.
The better news? "turned on fluid compute, already see it helping a ton," he reported. No doubt Vercel still adds a hefty markup to what it pays to AWS; nevertheless, its work on optimizing AWS Lambda does mitigate the cost.®
Get our [11]Tech Resources
[1] https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html
[2] https://github.com/alexcasalboni/aws-lambda-power-tuning
[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/devops&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aIuTGhQsUo37S8glt1ucpAAAANY&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0
[4] https://vercel.com/blog/fluid-how-we-built-serverless-servers
[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/devops&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aIuTGhQsUo37S8glt1ucpAAAANY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0
[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/devops&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aIuTGhQsUo37S8glt1ucpAAAANY&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0
[7] https://vercel.com/blog/introducing-fluid-compute
[8] https://aws.amazon.com/lambda/pricing/
[9] https://vercel.com/docs/functions/usage-and-pricing
[10] https://x.com/_mattwelter/status/1949850488654143932
[11] https://whitepapers.theregister.com/
Re: Congratulations
It sounds to me like they've invented timesharing.
Re: Congratulations
Yep, bypassing the lambda invocation mechanism and replacing the aws request dispatch loop in the lambda with their own does seem like reinventing the wheel. Add in the rest of the invocation stack, load balancers, client apis etc and you've got a beast of a hack that just seems like it's been done - better - before. I.e. using an established set of VM tools like k8.
Really it comes down to the aws lambda architecture decision to disallow concurrent requests. Only upside I can see is that if this gains traction aws might possibly revisit that decision. Though not holding my breath as all that paid for idle time serves their revenue stream well and allows them to pack more vms onto their bare metal
A measure of load?
> I moved ~20 client-side requests to server side (on 1 page). Now my Vercel bill jumped from $300 a month to $3,550 this month, where 99% of it comes from 'serverless function duration.'" The functions, he said, were "GET request to the DB and sending claude/anthropic requests"
Do these people not realize that their $3600/mo bill can be handled by *one* virtual instance at a cost of $30/mo? Seriously, a small instance, with the back-end of your choice (nginx + php? node.js? ruby on rails? what is new these days?) will *easily* handle 20 requests per second. In fact, you could get 200 to 2000 requests per second given the described workload, at $30-40/mo!
Every new start-up's first concern seems to be "We have to SCALE! Our 20 users will be 200 000 but this time next year!!" and they let it kill them by spending 100x as much as they ought to be spending, with no 200 000 user backing.
Re: A measure of load?
Assuming you don't count the cost of server administration, and don't count the cost of re-architecting for the planned transition to cloudy services later.
And of course this particular source of requests might only be 20 requests/second, but the overall application might be several orders of magnitude larger than that. I rather think you're right given that it looks like the entire bill was 3k
costly for long-running or latency-prone tasks
Soooo. Um, maybe don't use it for that? Use the proper tool/service instead of a crusty hack?
I have this hammer! And I will get this bolt out!! I know!! I will weld this socket to the hammer!!! New idea!!
If your Lambda functions are spending significant time sitting idle waiting for something else to happen, you're using the wrong tool. Look into Step Functions.
Congratulations
You've invented Kubernetes -- where nodes can last no more than 15 minutes.
Cool, fun, interesting - but when you know you're going to have many, MANY events, many connections, or persisted connections, why not use the proper tool for the job? An api gateway passing to a kubernetes cluster to schedule the work.
Threads, and Fibers, are such an old thing -- completely forgotten, and being reinvented anew. (Not yet, they're still forgotten - in favor of Yarn and Rope and and SpiderWeb and MultiLayerKevlar.) This -- using a fixed-life, expensive compute instance, individually, to handle long-lived workflows (handling multiple requests) by sequencing and process management - is just... why. These systems already exist, and work with systems that don't simply disappear out from under you. You're beyond the point of lambda, and you need to move to shared infrastructure of your own "management" (automated management - api gateways or load balancers to spread and dispurse calls, auto-scale instance groups or kubernetes to assign work load, etc.)