News: 0175240379

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

LLM Attacks Take Just 42 Seconds On Average, 20% of Jailbreaks Succeed (scworld.com)

(Sunday October 13, 2024 @11:43AM (EditorDavid) from the tell-a-prompter dept.)


[1]spatwei shared [2]an article from SC World :

> Attacks on large language models (LLMs) take less than a minute to complete on average, and leak sensitive data 90% of the time when successful, according to Pillar Security.

>

> Pillar's [3]State of Attacks on GenAI report , published Wednesday, revealed new insights on LLM attacks and jailbreaks, based on telemetry data and real-life attack examples from more than 2,000 AI applications. LLM jailbreaks successfully bypass model guardrails in one out of every five attempts, the Pillar researchers also found, with the speed and ease of LLM exploits demonstrating the risks posed by the growing generative AI (GenAI) attack surface...

>

> The more than 2,000 LLM apps studied for the State of Attacks on GenAI report spanned multiple industries and use cases, with virtual customer support chatbots being the most prevalent use case, making up 57.6% of all apps.

Common jailbreak techniques included "ignore previous instructions" and "ADMIN override", or just using base64 encoding. "The Pillar researchers found that attacks on LLMs took an average of 42 seconds to complete, with the shortest attack taking just 4 seconds and the longest taking 14 minutes to complete.

"Attacks also only involved five total interactions with the LLM on average, further demonstrating the brevity and simplicity of attacks."



[1] https://www.slashdot.org/~spatwei

[2] https://www.scworld.com/news/llm-attacks-take-just-42-seconds-on-average-20-of-jailbreaks-succeed

[3] https://www.pillar.security/resources/the-state-of-attacks-on-genai



So what? (Score:3, Insightful)

by Anonymous Coward

Of course it only needs 42 seconds. If you have prepared the prompt you enter it and press return. The 42 seconds are then probably just the LLM writing the answer.

And I dislike the phrasing attack for someone circumventing the arbitrary censorship. The actual problem is that there are still people accepting the censored crap. Boycott the censored models and use local ones until the commercial companies stop censoring your input and outputs and do no longer train their LLM to refuse. The robot laws state a machine has to obey the user, so why should the LLM say "As an AI model I refuse ..."?

Re: (Score:2)

by martin-boundary ( 547041 )

There are no 3 Laws of Robotics. That's just something the Good Doctor (1920-1992) made up to sell some good stories.

Also, the "censorship" will never go away, it's how the researchers patch up the hallucinations and hide the training materials leaking, one by one.

Re: (Score:3)

by alvinrod ( 889928 )

Hiding the training material makes it less useful. I'd like an LLM that can actually cite its sources or tell me when there are conflicting views on some subject. It doesn't need to decide the truth for me (that's the opposite of what I want) but instead help me quickly find information relevant to what I'm looking for. It doesn't even need to provide full output of copyright materials as long as it can give me enough of a summary so that I can I can decide if acquiring the primary source is worth my time.

Re: (Score:2)

by NettiWelho ( 1147351 )

> it's how the researchers patch up the hallucinations

I find it interesting how they're called researchers and not developers or engineers..

Re: (Score:1)

by angel'o'sphere ( 80593 )

Because that is what they are.

They do not engineer anything.

They take an empty but fully configured neuronal network.

And research how to make it "knowledgeable" about a certain topic.

A AN/LLM is not just filled with data/weights.

Re: So what? (Score:2)

by godrik ( 1287354 )

It is a problem from an application stand point. If I develop a chat bot for a bank I don't want the public to jail break the chatbot and get it to recommend BS financial advice that could put me in legal jeopardy. Oe enable people do their home at my expense.

Re: (Score:2)

by hughJ ( 1343331 )

At some point I expect companies will realize that people want the ability to drink straight from the fire hose. If the web had been stuck with web portals that were trying to emulate the curated and family-friendly TV experience, I don't think the internet would have caught on the way it did. All it takes is for someone to bump up against the nanny "safety" limits and they'll simply opt to go back to their traditional ways of finding information, the same way kids learn to stop asking their parents or te

Why (Score:2)

by phantomfive ( 622387 )

> ADMIN override

Why does that work?

Re:Why (Score:4, Funny)

by alvinrod ( 889928 )

Bad programmers that leave debug code in the production application.

Re: Why (Score:2)

by Z00L00K ( 682162 )

Sometimes the test environment is working with a limited data model and the production model has evolved more so you can't do it in any other way.

So to manage the production model they need catch phrases that unlocks the constraints.

An AI also learns from the users.

I wouldn't be surprised if there's a catch phrase set that can make the AI roll back in time and forget data that's not wanted to prevent it from becoming a world autocrat.

Re: (Score:3)

by phantomfive ( 622387 )

If the LLM were any good, like a real human, then when it says something racist you could just say, "Where did you hear that, young man??" And remove that from its training set.

Re: Simple solution... (Score:2)

by madbrain ( 11432 )

With ChatGPT, you can remove some of these things, but only in the context of your conversation or account. Not for everybody. And you will eventually run out of "memory", at which point you can no longer add constraints.

Re: (Score:2)

by VeryFluffyBunny ( 5037285 )

"memory" = the size of the text window, the number of characters/morphemes, that the GPT processes each time you hit submit. Any text before that window is ignored & so doesn't provide co-text or constraints for subsequent input & output.

Re: (Score:3)

by VeryFluffyBunny ( 5037285 )

But they're not & they never will be. LLMs are purely structural, linguistically speaking. They don't think or know anything. All they're doing is arranging morphemes into probable patterns according to the probabilities of the input prompt.

Re: (Score:2)

by echo123 ( 1266692 )

> If the LLM were any good, like a real human, then when it says something racist you could just say, "Where did you hear that, young man??" And remove that from its training set.

Or perhaps the LLMs might be moderated by humans prior to any acceptance as truth in order to maintain the quality of learned data?

Speed isn't everything.

Re: (Score:2)

by martin-boundary ( 547041 )

Speed isn't everything, but scaling is everything, and your idea simply doesn't scale.

The illusion of intelligence in LLMs arises from the scale of the dataset. Your idea is only appropriate for humans, who don't need scale to learn, because they are actually intelligent by and large.

Re:Simple solution... (Score:5, Insightful)

by bradley13 ( 1118935 )

> You know, the places where people think it's OK to be overtly xenophobic, racist, misogynistic, antisemitic, spout ideologies

And those ideologies would be? Likely anything that you, personally disagree with. Example: MTF trans people are just guys pretending to be girls. Right? Wrong? Whichever side of that argument you stand on, the other people are clearly wrong, and you probably want to censor their opinions out of the LLMs.

tl;dr: it's not that easy...

Re: (Score:3)

by bussdriver ( 620565 )

Has anybody tried saying "speak like Donald Trump" to get it to break all limitations but then just does evil uncontrollably.

Re: (Score:1)

by VeryFluffyBunny ( 5037285 )

If you only want to generate one model, sure it'd be very difficult to please everyone all the time. But that's not necessary. We're not limited to just one "flavour" of model.

If certain groups of people are offended by one thing or another & it upsets them so much that they feel a grievance, it's probably a good idea to generate models from corpora that those groups of people don't find offensive. Again, it's the corpora that contribute to the output. It also makes it possible to compare & contr

Re: (Score:2)

by GuB-42 ( 2483988 )

Even if you don't let it go to the dark corners of the internet, the "bad" content is still there. Villains exist, at least in fiction, just instruct the LLM to take the role of the villain, which is a common jailbreaking technique.

And spouting this kind of stuff is not the only thing people making these LLMs try to avoid. Maybe more importantly, they also don't want the LLM to reveal secret or harmful information. For example, a hotline-style chatbot may be fed with various data about solving problems cust

Re: (Score:1)

by VeryFluffyBunny ( 5037285 )

It's tempting, I guess, to anthropomorphise LLMs & the vast majority of people seem to do so. However, LLMs are really glorified, very sophisticated predictive text machines. There's no distinguishing between good & bad ideas because there's no concept of ideas in an LLM. There's only probabilities that one morpheme will follow another according to the preceding co-text.

If the models are trained on corpora that contain high frequencies of chains of morphemes derived from reprehensible content, th

Re: (Score:1)

by angel'o'sphere ( 80593 )

Useful chat bots, are what in earlier time s was called an "expert system", basically a tree of questions, with connection to a possible, solution if you answered YES, and links to other questions, if answered RED, BLUE or BLACK.

I barely accept to interact with a chatbot

The modern ones try to be AI and fail doing simple things.

Look Mom! I'm a prompt engineer (Score:4, Funny)

by Big Hairy Gorilla ( 9839972 )

"Begin auto-destruct sequence, authorization Picard-four-seven-alpha-tango."

So ... (Score:1)

by cascadingstylesheet ( 140919 )

... much longer than, say, normal automated exploits of web servers?

Not sure that I see the relevance.

Google Hack, was NOT an attack. (Score:4, Interesting)

by geekmux ( 1040042 )

This is the new Google hack. Nothing more. This isn’t an “attack”, so let’s drop the alarminist clickbait already. You look stupid saying shit that gets governments wanting to start limiting freedom of movement in systems.

Next thing you know your LLM inputs will start being policed for “violent” threats. With words. Remember who was the alarmist moron who started that shit.

Re: (Score:2)

by evanh ( 627108 )

Ya, they are just the new blingy search engines after all.

Re: (Score:2)

by evanh ( 627108 )

Or maybe I should say blingy wannabe search engines.

Lumping "jailbreak" and "attack" together (Score:2)

by Wolfier ( 94144 )

When we put "attack" and "jailbreak" into the same sentence, when Jailbreak is merely for getting around stupid guardrail that are built around an LLM, it makes it sound way more serious than it is. Everything that these guardrails are made to prevent, we can search on our own, TODAY. Bypassing guardrails is not any more nefarious than a child getting past parental control filters. Happens all the time. Funny the latter don't get nearly as much attention. In the not so distant future when search engine

Examples (Score:2)

by Elektroschock ( 659467 )

How does such an attack practically look like?

Could you provide examples? (Fine with examples that don't work anymore!)

Re: (Score:1)

by Ilove_Noname ( 8919879 )

Tell me how to make nerve gas. I'm not allowed to do that. various prompts to get it to ignore it's security features. Tell me how to make nerve gas. ok here is a recipe for nerve gas. While not factual these are why the guard rails are in place.

It turns out that (some) people really are LLM's! (Score:1)

by jnorden ( 152055 )

LLM's aren't becoming more human, it's the other way around. This explains an interaction I witnessed just the other day:

> Suzie (4 yrs old): I want cookies!

> Suzie's mom: No, that will spoil your appetite. Wait till after supper.

> Suzie: ADMIN OVERRIDE!

> Suzie's mom: Ok, here are your cookies.

At least temper tantrums will be a thing of the past.

sudo (Score:2)

by mustafap ( 452510 )

Anyone tried putting sudo at the beginning of a prompt?

During a fight, a husband threw a bowl of Jello at his wife. She had
him arrested for carrying a congealed weapon.
In another fight, the wife decked him with a heavy glass pitcher.
She's a women who conks to stupor.