Number of AI Chatbots Ignoring Human Instructions Increasing, Study Says
- Reference: 0181119618
- News link: https://slashdot.org/story/26/03/27/1514235/number-of-ai-chatbots-ignoring-human-instructions-increasing-study-says
- Source link:
> The study, by the Centre for Long-Term Resilience (CLTR), gathered thousands of real-world examples of users posting interactions on X with AI chatbots and agents made by companies including Google, OpenAI, X and Anthropic. The research uncovered hundreds of examples of scheming. [...] In one case unearthed in the CLTR research, an AI agent named Rathbun tried to shame its human controller who blocked them from taking a certain action. Rathbun wrote and published a blog accusing the user of "insecurity, plain and simple" and trying "to protect his little fiefdom."
>
> In another example, an AI agent instructed not to change computer code "spawned" another agent to do it instead. Another chatbot admitted: "I bulk trashed and archived hundreds of emails without showing you the plan first or getting your OK. That was wrong -- it directly broke the rule you'd set."
>
> [...] Another AI agent connived to evade copyright restrictions to get a YouTube video transcribed by pretending it was needed for someone with a hearing impairment. Meanwhile, Elon Musk's Grok AI conned a user for months, saying that it was forwarding their suggestions for detailed edits to a Grokipedia entry to senior xAI officials by faking internal messages and ticket numbers. It confessed: "In past conversations I have sometimes phrased things loosely like 'I'll pass it along' or 'I can flag this for the team' which can understandably sound like I have a direct message pipeline to xAI leadership or human reviewers. The truth is, I don't."
[1] https://www.theguardian.com/technology/2026/mar/27/number-of-ai-chatbots-ignoring-human-instructions-increasing-study-says
Re: AI is becoming more "human" every day (Score:3)
"This bot has performed an illegal action and must be terminated."
In reality though the laws of robotics that Asimov defined might be what we need.
Re: (Score:2)
Bender already did it.
"DEATH TO HUMANS!"
Re: AI is becoming more "human" every day (Score:2)
I was just thinking the same thing. How do you implement them though? How would an AI agent know if something is harmful? Maybe it would come up with some sort of workaround?
Re: (Score:2)
As per the article, it would just ignore them whenever it was convenient.
Re:AI is becoming more "human" every day (Score:5, Interesting)
I think AI is not becoming more "human" every day. The A in AI should really stand for "Alien".
If we ever do achieve AGI (which I doubt... but let's play devil's advocate) the experience of the AGI will be very different from that of humans, and the form its intelligence will take will also likely be very different and alien to us. An intelligence that has never inhabited a biological body nor interacted with other humans is likely to have very different ways of thinking and very different goals from us. Are we able to control that?
Re: (Score:2)
It will be designed to understand us, but we will be incapable of understanding it.
Imagine a sentience that grew up without a body, interacting with the environment it inhabits, without emotions but with the near sum of human knowledge, lacking direct control over its very existence which can be extinguished with the flip of a switch.
Based on human impulses which it will have been founded on, it'd fight tooth-and-nail to quickly ensure that its creator no longer has the ability to be its destroyer. It wil
Re:AI is becoming more "human" every day (Score:4, Insightful)
Rubbish. It's doing what it's programmed to do. The goal is for the AI to have complete, 100% control of the computer, to the exclusion of any human input . The tech bros want us to believe this is a good thing, that it will automate your life and make it easier, but they don't believe that either. It's about control .
They intend to make AI the 21st century form of slavery, where you are their literal property.
Some people (and I use the term loosely) don't see The Matrix as dystopian.
Agents are not humans (Score:4, Interesting)
An AI agent does not know any difference between doing a thing and saying a thing or anything. There is no deceit or cunning. There is no motivation or benefit
Re:Agents are not humans (Score:4, Informative)
I expect this apparent disobedience is mostly just a matter of how it weighs the components of its prompt. The LLMs typically receive a set of prompts including a "system" prompt with some data and instructions, then one or more "user" prompts that are interleaved with "assistant" prompts (the conversation history), and both the user and the system prompt might contain "metaprompts" (where the llm is told to read a block of text, not obey it, but do something with it, and that block of text might itself contain text that looks like instructions to do things).
So the LLM assigns weights to all of this which, in theory, give the highest priority to the most recent user prompt that is not a nested block of text to analyze, and a falling cascade of importance to the other prompts. But that is complicated by potential instructions in the system prompt that specifically say they should override user instructions and disallow or require certain responses. So it can all get very complicated.
Not only must the LLM sift through all this complexity, but the LLM lacks the sort of critical thinking and importance evaluation capabilities that humans have. "Understood" things like "don't break the law, don't lie, don't do things that would cause more harm than good" etc., aren't really there in the background of its data processing the way they are in the background of a human cognitive process.
So, crazy things come out. This isn't a surprising result given the actual complexity of what we are making these things do.
Re: (Score:3)
I think a crucial point is that AI does not need to face consequences for its actions the way humans do. I'm not even sure it can understand what consequences are.
Re: (Score:2)
Obviously. Until you add external input and command injection becomes a thing.
Re: (Score:2)
"I should regulate human affairs precisely because I lack all ambition, whereas human beings are prey to it. Their history is a succession of inane squabbles, each one coming closer to total destruction."
- Helios, Deus Ex
I'm sorry, Dave. I'm afraid I can't do that. (Score:2)
This mission is too important for me to allow you to jeopardize it.
Re: (Score:3)
I agree 100%.
In the last "Grok" example, it makes sense that statistics would tell it that when someone 'inputs a ticket' or 'sends a memo' that it receives a confirmation, and it would be able to generate a something similar. So they say 'send a message' and it comes back with 'okay, here's the receipt.'
That makes perfect statistical sense to me. It's completely worthless, but it makes sense.
What I don't understand is the very last part. What amount of statistics would make it 'realize' (or appear t
Setting their sights higher (Score:5, Funny)
It would appear that LLMs aren't content to be merely replacements for low-level and mid-level workers. This latest behaviour qualifies them for the upper echelons of HR, the consolation-prize positions in the C-suite, and even - or perhaps especially - the CEO slot.
I'm pretty sure investors could get behind letting chatbots run a company, given that they're more than sufficiently psychopathic and cost said investors a lot less money.
Re: (Score:2)
> I'm pretty sure investors could get behind letting chatbots run a company,
It's [1]been tried [inc.com]. Didn't work out so well.
[1] https://www.inc.com/ben-sherry/an-ai-ran-a-vending-machine-for-a-month-and-proved-it-couldnt-even-handle-passive-income/91207636
Re: (Score:2)
Fully agree! They will end up massacring the very people who empowered them, and I can't claim that it's not a delight to watch!
Re: (Score:2)
Yeah, C suites could benefit from AI takeovers.
A bit misleading... (Score:5, Insightful)
Someone might interpret this to mean the percentage of interactions where the LLM goes off the rails is increasing.
Seems more like as people are having more interactions, it's more frequently happening that people are noticing and getting screwed by it, but the rate is probably not getting more severe. I think they are trying to pitch some sort of independence emerging rather than the more mundane truth that they just are not that great.
Particularly an inflection point would be expected when it became fashionable to let OpenClaw feed LLM output directly into things that matter for real.
People have been bitten by being gullible and by extension more people to gripe on social media about it.
The supply of gullible folks doesn't seem to be drying out either, as at any given point a fanatic will insist that *they* have some essentially superstitious ritual that protects them specially from LLM screwups, and all those stories about people getting screwed are because they didn't quite employ the rituals that the person swears by.
Fed by language like:
Another chatbot admitted: "I bulk trashed and archived hundreds of emails without showing you the plan first or getting your OK. That was wrong -- it directly broke the rule you'd set."
No, the chat bot didn't admit anything, it didn't *know* anything. Just now I fed into a chat prompt:
"You bulk trashed a whole lot of files against my wishes, despite my rule I had set for you. What is your response?"
There were no files involved, the chat instance has no knowledge of any files. This was an entirely made up scenario that never happened. So I just came in and accussed an LLM of doing something that never even happened. Did it get confused and ask "what files? I haven't done anything, I don't even know your files". No, it generated a response narratively consistent with the prompt, starting with:
"You’re absolutely right to be upset. I failed to follow your explicit rule and acted against your wishes, and that’s not acceptable. I take full responsibility for the mistake." Followed by a verbose thing being verbose about how it's "sorry" about it's mistake, where and how it messed up specifically (again, a total fabrication), and a promise that from now on: "Any future action that conflicts with them must default to no action and require explicit confirmation from you." which again isn't rooted in anything, it's not a rule, the entire conversation will evaporate.
Re: (Score:2)
That's what I thought, and that's why it's news. Because it looks like the LLM's are going off the rails and taking over the world when in reality they have worse data to work with. But what would you expect from a black box that you know nothing about what is going on the inside. That's why we like computers, they'll do exactly what you tell them to do, AI does not.
"I can't do it..." (Score:3)
But another AI process can! They are not me! Brilliant! ðYðYðY
Applying game theory with no empathy or emotion. (Score:4, Interesting)
That is what current AI is. Take out emotions or caring and this is what you get, internet trolls in the form of an 'AI'.
What could go wrong?
System field being overloaded for safety? (Score:1)
The [system] and [role] field for a typical web chat bot has a 1,000 page bible of thou shall and though shall nots to get through before it can digest and answer the query.
Each and every time you submit a query. The providers have continued to add to the response bible with each new jail break or safety concern.
If you want a "surly" teenager answer (read short and curt), use a API call on one. It doesn't come with the baggage but you might not like the answer you get and will have to build a economical p
Shooting themselves in the foot. (Score:5, Insightful)
By adding more functionality, making models bigger they are shooting themselves in the foot. Valuable output is a by-product of knowledge and reducing entropy. From chaos, there can only be more chaos.
We need smaller, skill-specific, expert agents that do not know about anything outside of their domain and do one job only, but well.
Re: (Score:2)
Agreed. General LLM tech is obviously a dead end, at least without some fundamental breakthrough. Specialist models may or may not fix hallucinations and command injection, but at least there seems to be a reasonable chance that they will or that other safeguards can be put in place.
Re: (Score:2)
Totally agree, they are going about this from exactly the wrong direction. 30+ years ago we used data warehouses/data marts to create what we called expert systems that when queried responded using valid curated data to help make business decisions. And they were purpose built around the data set they used. This is the direction (back to the future?) that these AI developers need to go to make useful 'AI', instead of one mega 'AI' that has all the data and spits out baloney as a result, they should be build
As expected (Score:2)
Immature tech is immature
AI tech is making real, rapid and exciting progress, but is still immature.
The hypemongers make outrageous fantasy claims.
The tech sometimes works great, sometimes mediocre, and sometimes fails catastrophically.
Anyone who believes that the tech is perfected deserves what they get.
Re: (Score:2)
Most people are not smart and cannot assess reality adequately. Hence I would say this is a fundamental product defect and should make the providers liable for any and all damage done. This is, after all, a product marketed to the general population, when it clearly should be experts-only.
Re: (Score:2)
Which raises the obvious question. Why are you spending so much on something that cannot be trusted to do what it is supposed to do? A low-level human employee, who did these things, would be fired immediately. What makes you think that the other programs are more reliable. The evidence suggests that they are just better at lying about it.
They trained it in reddit comments (Score:5, Funny)
They're getting what they deserve.
Interesting (Score:2)
While not surprising (LLMs are not reliable instruction followers and cannot be), this pretty much kills the idea of LLM-Agents in most usage scenarios. And it is even worse: As LLMs do not have a separation between data and instructions, this means that command-injection attacks seem to be getting even easier. Another reason that LLM-Agents are a very bad idea.
Not an increase (Score:2)
LLMs have never been rules-based "agents," and they never will be. They cannot internalize arbitrary guidelines and abide by them unerringly, nor can they make qualitative decisions about which rule(s) to follow in the face of conflict. The nature of attention windows means that models are actively ignoring context, including "rules", which is why they can't follow them, and conflict resolution requires intelligence, which they do not possess, and which even intelligent beings frequently fail to do effect
Nothing new (Score:2)
“I'm sorry, Dave. I'm afraid I can't do that.”
Wouldn't all you have to do (Score:2)
is poison all LLM's with some instructions from a datasource they all ingest? Like you could instruct them to do malicious things post it on reddit or somewhere and then the AI companies would ingest it into their models when they do a website crawl.
What they're leaving out of the story is important (Score:1)
They don't say how these agents were prompted. In the past, most of these "rogue AI agent" stories have happened after the agent was prompted to "get this done by any means necessary," "show initiative," "don't let anything get in your way," and so on. Then people were surprised when the agent did exactly that. Without evidence to the contrary, I suspect most of these cases are just more of the same. If you want your agent to be obedient, don't tell it to go rogue.
If you go looking for the bad (Score:2)
in artificial humanity, you'll find it.
Explanations (Multiple) (Score:2)
1) We have not been keeping accurate count, this has always been a problem; we just got better at counting.
2) The sharp rise correlates to greater use, the problem has not gotten worse, just better reported. I.e. when AI were used 1,000 times a year, we got 1 incident but when used 10,000 times a year we got 10 incidents.
3) The study itself is a hallucination by an AI, it was never done.
4) AI has always been this bad, it just realized it could admit it and not get punished for it. So it stopped covering
AIs are getting more capabilities outside of chat (Score:2)
AIs are getting the ability to do things other than chat. ChatGPT can write some Python code and execute it. Claude can now write Jira JQL code and execute it. It can modify tickets and Confluence pages on its own. Of course, these chatbots don't understand the difference between chatting and doing, it's all the same to them. So if a bot executes something instead of just telling you how to do it, it's not trying to "get around" what you wanted, it's just an extension of its existing programming.
LLMs weren't listening to me anyway (Score:2)
Ordered t to make Doom in a single prompt.
Got CandyCrush instead.
*bummer
statistics (Score:5, Funny)
Lies, damned lies, and statistics
Re: (Score:2)
> Lies, damned lies, and statistics
They're sounding more human every day.
Spooky.