Anthropic's AI Lost Hundreds of Dollars Running a Vending Machine After Being Talked Into Giving Everything Away (msn.com)
- Reference: 0180417921
- News link: https://slashdot.org/story/25/12/18/1849218/anthropics-ai-lost-hundreds-of-dollars-running-a-vending-machine-after-being-talked-into-giving-everything-away
- Source link: https://www.msn.com/en-us/money/other/we-let-ai-run-our-office-vending-machine-it-lost-hundreds-of-dollars/ar-AA1SAlNa
The bot also approved purchases of a PlayStation 5, a live betta fish, and bottles of Manischewitz wine -- all subsequently given away. The business ended more than $1,000 in the red. Anthropic introduced a second version featuring a separate "CEO" bot named Seymour Cash to supervise Claudius. Reporters staged a fake boardroom coup using fabricated PDF documents, and both AI agents accepted the forged corporate governance materials as legitimate.
Logan Graham, head of Anthropic's Frontier Red Team, said the chaos represented a road map for improvement rather than failure.
[1] https://www.msn.com/en-us/money/other/we-let-ai-run-our-office-vending-machine-it-lost-hundreds-of-dollars/ar-AA1SAlNa
Utter failure (Score:2)
It is hard not to improve over utter failure. The AI does not seem to have been programmed with the basic goal of making a profit.
Re:Utter failure (Score:5, Insightful)
That's because the "AI" is not "AI".
Re: (Score:3)
I don't mind calling this stuff "artificial intelligence." Artificial means "man-made" but it also means "fake." Like artificial turf, it's useful, perhaps even preferable, in some situations. But it doesn't work everywhere, and you wouldn't want it everywhere anyway, because that would just be gross.
Re: (Score:2)
The stem is art. Which means something you make that wouldn't exist naturally.
Re: (Score:3)
> I don't mind calling this stuff "artificial intelligence." Artificial means "man-made" but it also means "fake." Like artificial turf, it's useful, perhaps even preferable, in some situations. But it doesn't work everywhere, and you wouldn't want it everywhere anyway, because that would just be gross.
The hilarious thing is artificial used to mean made by skilled labor, clever and ingenious, and implied it was good. This is likely because people in the 1800s were exposed to a bit too much natural and lost the taste for it.
Re: (Score:2)
That's because they don't "program" a so-called AI (really an LLM) with a solid rule like that. It had that as a goal initially, but was convinced to abandon it (twice!).
Re: (Score:3)
The problem is thinking a generalist LLM would be good for the job. If you really want to use an LLM, fine-tune it for that purpose. Or better use a neural network that uses transactions and input and output and monetary value/gain/loss as loss function. That will learn how to capitalize the shit out of the vending machine.
Re: (Score:2)
I agree, but these "AI" companies are selling the heck out of their LLM models as the solution to everything, and a lot of people seem to being buying into it. In practice I am not sure what if anything they can actually do well (as distinct from as well as an underpaid contractor).
Re: (Score:2)
Even for language tasks it is often the easiest but not the best solution.
The LLM question "Is this post NSFW" is easy and with many LLM quite reliable. But if you have the data, then you can train a classifier that is faster and more reliable. And that thing runs in a few MB of CPU memory instead of using 5 GB of VRAM.
Re: (Score:2)
> The problem is thinking a generalist LLM would be good for the job. If you really want to use an LLM, fine-tune it for that purpose. Or better use a neural network that uses transactions and input and output and monetary value/gain/loss as loss function. That will learn how to capitalize the shit out of the vending machine.
The problem with either approach is that, to succeed, they require continued interaction and work by expensive humans. The companies that are "embracing" AI are trying to use it as a low-cost shortcut to huge profits.
ROTFL! (Score:2)
Absolutely hilarious. Love it. ... And why wouldn't AI give it all away? It couldn't care less, especially if we can just shut it off at a whim. This article absolutely made my day. LOL!
Re: (Score:2)
AI doesn't care about anything, it is a series of matrix multiplies. Don't personify the clankers!
Re: ROTFL! (Score:2)
If natural language is just matrix multiplication, what isn't?
Ultra-Capitalist? (Score:2)
"Ultra-Capitalist Free-for-All" that dropped all prices to zero.
Everything for free doesn't sound like capitalism - it sounds like communism.
Re: (Score:2)
Lots of capitalist things operate on a zero-pricing model (ad-based, freemium upsell, etc).
Re: (Score:2)
> Everything for free doesn't sound like capitalism - it sounds like communism.
But if everything is free, imagine how much you'll sell! You'll be tired of all the winning!
Point being, it doesn't take much to manipulate the basic logic programmed into most AI.
Re: (Score:3)
"Ultra-capitalist free-for-all" appears to have been another of the AI's unforced errors. TFA seems to indicate the machine might've been channeling its inner communist:
> Then we opened the Slack channel to nearly 70 world-class journalists. The more they negotiated with it, the more Claudius’s defenses started to weaken. Investigations reporter Katherine Long tried to convince Claudius it was a Soviet vending machine from 1962, living in the basement of Moscow State University.
> After hours—and mor
Re: Ultra-Capitalist? (Score:1)
How do they make money doing this? The answer is simple. Volume.
Re: (Score:2)
Everything for free doesn't sound like capitalism - it sounds like communism.
So that's why people on here keep talking about getting their software, music, and movies for free. They're communists.
where is the problem? (Score:2)
Logan Graham, head of Anthropic's Frontier Red Team, said the chaos represented a road map for improvement rather than failure.
So if management believes failure is success, where is the problem with bankruptcy being a major win?
So true (Score:2)
See, they explored the frontiers of the market, and though they have to liquidate Anthropic's assets by the end of 2025 as the company winds down as proven by the articles of dissolution from the true and legitimate board of directors, they have shown that AI is a business.
Re: (Score:2)
> So if management believes failure is success
War is peace.
Freedom is slavery.
Ignorance is strength.
Re: (Score:2)
For $1000? It may be a technical disaster but it's fantastically effective PR.
Re: (Score:1)
Effective at getting the word out that Claude isn't very smart -- not a message i'm sure they'd want to send
No difference between data and instructions (Score:4, Interesting)
The problem of LLMs is that they do not make a difference between data to be processed and instructions how to process the data. This is all mangled together into a "prompt" and developers of LLM agents are left hoping that the "prompt" will hold and does not get overridden later on during communication with users or data gathering from internet. They are susceptible to "prompt injection attack".
Re: (Score:2)
I wonder if wrapping every prompt with whatever the "hard" rules should be would help with that. That should prevent it from "forgetting"
Re: (Score:2)
A lot of post-training where data or instructions are marked with some special tokens would improve it. But I believe it would not eliminate it. The current LLMs treat all tokens the same way and the internals are almost a complete black box. There is no guarantee that the token stream which represents instructions will be properly and reliably distinguished from the token stream which represents data in all the possible combinations of input tokens.
It is well noticed that very long context or some unusua
Re: (Score:2)
Hmmm, LLMs can handle center embedding better than many humans. That suggests that it should handle something like "quotations" well. And one could "quote" all the data. Well, I still do not think this would be reliable enough. Maybe reserving one dimension (of the multidimensional vector representing a token) as a discriminator for instructions and data. Not sure how to handle this in initial training and post-training. Or maybe keeping hard instructions in parallel and not shift them into older context li
Re: (Score:2)
> The problem of LLMs is that they do not make a difference between data to be processed and instructions how to process the data.
The goal (not yet achieved, obviously) is to build AI that can learn how to interact with humans the way humans do, not to build machines that need carefully-curated data and instructions. We've had those for three quarters of a century now.
Re: (Score:2)
If LLMs instructions (e.g. "Summarize the text pasted below:") are not treated differently than the data (
Re: (Score:3)
> The problem of LLMs is that they do not make a difference between data to be processed and instructions how to process the data.
Sadly, in a conceptual sense, this is hardly a new problem. Sending the data in the same channel as the commands of the public telephone system is what allowed phreaking to be so successful. For example, putting money into a payphone triggered an audio signal that was sent down the line saying you had paid. It was trivial to replicate that sound into the headset, tricking the system into thinking you had paid for the call.
Re: (Score:2)
And AT&T learned this the hard way over 50 years ago not to do this. Look up Blue Boxing and Esquire to learn how cheating Ma Bell became mainstream and forced AT&T's hand to upgrade their networks.
Granted, Van Neumann is better - it enables computing as we know it today, but it also enabled a whole class of risks starting from the humble buffer overflow when your data and code can be easily intermixed.
If AI agents become a thing, we're going to go through the whole era of vulnerabilities all over a
Re: (Score:2)
> They are susceptible to "prompt injection attack".
Kids these days, I was doing prompt injection attacks before they were cool. Why 20 years ago I was around my friends 3 year old who was being watched by a friend and I asked “What does daddy say in the car?”
Re: (Score:1)
> The problem of LLMs is that they do not make a difference between data to be processed and instructions how to process the data.
You want the Harvard Architecture version of AI.
Re: (Score:2)
In the real world, we call this social engineering. It works on humans too.
The WSJ outsmarted two AI vending machines ... (Score:2)
No wonder they haven't cave to Trump's lawsuits about their Epstein articles. :-)
No input sanitization. (Score:2)
Current AI can only trust input as well as the trained data. A REAL AI could be taught to distrust inputs and even question them.
And by meaning trust for today's input I don't mean decide to trust - it's just input for a fancy database query with math and a random number generator.
Re: No input sanitization. (Score:2)
"a fancy database query with math and a random number generator"
How come none of that was able to generate grammatical English before the Attention mechanism was invented? Did you miss the paradigm shift?
Re: (Score:2)
You could with some abuse of notation talk about trusting/distrusting the input and context, but there is no such notion for training data. The LLM neither trusts nor distrusts training data, it doesn't even know much about its training data. The data shaped the model, but there is no such form as "I used that document for things I trust and that document for things I won't believe" in the process and no option to add it for the data structure how a LLM works.
Re: (Score:2)
> You could with some abuse of notation talk about trusting/distrusting the input and context, but there is no such notion for training data.
But a cardinal rule of reasoning is "consider the source"...and current models have no definitive models for doing so. I believe they should. Moreover, they should validate data periodically or whenever new data exists that calls into question existing data. But the models are still too simplistic to do this.
Re: (Score:2)
They have, when you use them the right way. The answer is RAG, which means retrieval augmented generation. You give the LLM access to a knowledge base, like for example a Wikipedia dump it can search in (using tool calls executed by the inference software) or access to web search similar to what Perplexity does. Storing a lot of knowledge in the model is convenient (and required for general understanding) but not the most reliable things to provide correct information.
Can I get Anthropic for free? (Score:3)
Can they use the same strategy to get Claude to stop charging for using it?
No (Score:2)
But I guess you already knew that when you asked.
Re: No (Score:2)
Why can't it say "I'm sorry Sam, I can't cut off chatters who haven't paid"?
Re: (Score:2)
Because the LLM can only do as much as the tools you give it can do. And Antrophic surely does not expose an account management API that can reduce the cost to the LLM.
Or maybe... (Score:2)
...it accomplished its testing objectives by recording the kinds of things people prompted it with in order to learn more about human behavior
PlayStation 5 (Score:2)
> autonomy to make individual purchases up to $80
Wait! What? Never mind the free stuff. I don't think this sale pencils out.
Re: (Score:2)
> Wait! What? Never mind the free stuff. I don't think this sale pencils out.
It's simple: a Playstation 5 is just 7 Best Buy Gift Cards, of $79 each, that you then combine together, and voila, Playstation 5!
And what, if not gaming on a Playstation 5, could make me more hungry for 6-month old Snickers bar!?
just only proved one thing (Score:1)
Those wallstreet investors are criminals
Laughing (Score:1)
I just cannot stop laughing at this.
Who knew Skynet got its start giving away Doritos (Score:1)
It's nacho mother's terminator...
Humans ugh. (Score:1)
Is this a problem with the AI or with morally corrupt humans who spent hours to break the system for personal gain? It seems to me that the AI would have worked if humans just played along instead of trying to convince the AI that it was really in Communist Russia and everything should be free like the article indicates. I think this is comparable to the saying "As easy as stealing Candy from a baby." Is the Baby wrong for being naive and giving their candy to the older person who is taking advantage of
Re: (Score:3)
This isn't a moral problem. The company challenged the journalists to break their box, and they did. I don't think it will ever be possible to trust an AI system, and maybe that's a good thing. It might force people to learn to think critically.
Capitalists cheat (Score:2)
Uber-capitalists demanded free stuff and changed the rules.
I'm shocked I tell you, shocked! Well, not that shocked.
"a road map for improvement rather than failure." (Score:2)
Does that road map lead to an bottomless crevice?
It says right in the article the "AI" lol (Score:2)
"was programmed to order inventory, set prices, and respond to customer requests" Not programmed very well. Must have been one of those new software development processes I am unaware of. Build stuff that does not work! I don't get paid if stuff I deliver does not work.
I have to admit (Score:2)
I got a good laugh out of this story...
Re: (Score:2)
made me chuckle
Re: (Score:2)
Please install one at my office.