Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities

(Tuesday October 15, 2024 @05:30PM (msmash) from the fault-in-our-stars dept.)

Reference: 0175259161
News link: https://apple.slashdot.org/story/24/10/15/1840242/apple-study-reveals-critical-flaws-in-ais-logical-reasoning-abilities
Source link:

Apple's AI research team has uncovered [1]significant weaknesses in the reasoning abilities of large language models, according to a newly published study. MacRumors:

> The study, [2]published on arXiv [PDF] , outlines Apple's evaluation of a range of leading language models, including those from OpenAI, Meta, and other prominent developers, to determine how well these models could handle mathematical reasoning tasks. The findings reveal that even slight changes in the phrasing of questions can cause major discrepancies in model performance that can undermine their reliability in scenarios requiring logical consistency.

>

> Apple draws attention to a persistent problem in language models: their reliance on pattern matching rather than genuine logical reasoning. In several tests, the researchers demonstrated that adding irrelevant information to a question -- details that should not affect the mathematical outcome -- can lead to vastly different answers from the models.

[1] https://www.macrumors.com/2024/10/14/apple-study-reveals-flaws-in-ai-reasoning/

[2] https://arxiv.org/pdf/2410.05229

Uh - duh? (Score:2, Redundant)

by peterww ( 6558522 )

AI does not reason. It predicts word ordering. Reasoning requires knowledge bases with semantic knowledge and analysis. Word ordering just puts jumbles of symbols in order.

Re: (Score:3)

by tysonedwards ( 969693 )

An LLM is: what if I gave my smartphone keyboard autocomplete unlimited resources and trained it on everything ever written by anyone? Just like, money is no issue, give it all the processing power and memory and data... what could it do?

Turns out, a lot. But, it is still fundamentally limited by the whole starting point of building the best auto-complete.

Re: (Score:1, Troll)

by Moryath ( 553296 )

"Trained" is itself the wrong terminology. "Training" implies learning, which implies intelligence. LLMs are a giant statistical-probability database with an impressive depth of connection between each individual tokenized node, but nowhere in there does any actual intelligence or reasoning ability exist.

The whole term "artificial intelligence" is the problem. It, and the use of terms like "training," lead people to anthropomorphize what they shouldn't.

Re: (Score:2)

by i kan reed ( 749298 )

Bit of an old man yelling at clouds here. Programming relies on a lot of metaphors to help us understand the purpose of things.

I do not think semaphores are using little colored flags to control my threads,

which I do not believe to be strings bound on spools to divide my jobs,

which I do not believe to be gainful employment on the part of my code.

And:

Objects are not things I can hold.

Models are not toy planes

Servers don't bring you your food

Links are not part of a chain

Calling functions does not require a p

Re: Uh - duh? (Score:2)

by ArmoredDragon ( 3450605 )

> The whole term "artificial intelligence" is the problem

It's a term that never really had any practical meaning other than a program that responds to inputs. In the 80s and 90s, AI was your chess opponent, which basically did fancy heuristics with a static ruleset. It never was intelligent, and still isn't. When most companies describe their product as AI, it's not even LLM, it's just a variation of the ol' chess opponent.

Though I'd have to slightly disagree about your training comment. For LLM, yes, it's not training so much as just adding data points for the d

Re: Uh - duh? (Score:2)

by ArmoredDragon ( 3450605 )

Bleh, correction: Solely NN

Re: (Score:2)

by Ksevio ( 865461 )

"Training" is an accurate and correct term to use here. Not only is it the common terminology in the machine learning field for decades, but it describes what is happening.

LLM models aren't just databases, they're weighed neural networks that will produce a given result based on a given input. The training is to adjust the weights to properly produce the result. Without the training, the model produces gibberish.

Reason (Score:1)

by JBMcB ( 73720 )

There is no reasoning. It's pattern matching based on keywords and weights feeding into Markov chains. Most LLMs also have some inferencing ability hardwired in there by humans, but they don't make those inferences on their own.

Re:Reason (Score:5, Insightful)

by Baron_Yam ( 643147 )

The funny thing is... Somehow our ability to reason is an emergent property of weighted connections in a network. Because we don't understand how that happens, we don't know why it isn't happening with the AI we have created, or if it's even possible with the setups we're using. We also don't know if it's impossible for a sufficiently complex version of an existing AI system to do it.

Probably impossible, I suspect there's more than just 'embiggen it and it will happen'.

Re: (Score:2)

by alvinrod ( 889928 )

A traditional neural network isn't the best approximation of actual neurons, so we don't get something that works in quite the same way. The hardware we run these programs on isn't like an actual brain either. However, when we do create software that actually models a physical brain, it does behave like one. There have been some studies conducted to recreate a worm brain in software as it has a smaller number of neurons and can be easily mapped out since dissecting worms isn't going to raise many eyebrows.

Idiocracy bucket problems (Score:2)

by goombah99 ( 560566 )

If I gave you a 5 gallon bucket and a 2 gallon bucket, how many buckets did I give you?

Dupe di-dupe di-dupe di-dupe dupe dupe (Score:3)

by gweihir ( 88907 )

And please stop claiming "faults" in "LLM reasoning abilities". LLMs have no reasoning abilities and pattern matching is not a valid substitute.

Re: (Score:1)

by Whateverthisis ( 7004192 )

I think it's fair for Apple to point this out and use "LLM reasoning abilities". You're right in what you're saying, but when you have people who are claiming they're on the path to making "General Artificial Intelligence", or we're "4 years away from AI that will eliminate 50% of jobs", the suggestion is that the AI is actually able to reason; that it's truly intelligent. So it's good that someone with the right resources and the ability to know what's going on can use the language of those hyping AI and

Re: (Score:2)

by gweihir ( 88907 )

Hmm. I do admit I sometimes forget the low "reasoning ability" level many people operate on.

Re:Dupe di-dupe di-dupe di-dupe dupe dupe (Score:4, Interesting)

by war4peace ( 1628283 )

Question is, do Slashdot editors have enough reasoning abilities, considering the dupefest here?

Re: (Score:2)

by sconeu ( 64226 )

Maybe Apple could do a study revealing the critical flaws in Slashdot editors' "reasoning" abilities?

Re: (Score:2)

by phfpht ( 654492 )

NO. But, being just a bad as humans is not a validation of generative AI.

Re: (Score:2)

by toxonix ( 1793960 )

The headline/description is garbage.

but

Apple needs to temper peoples expectations when Sam Altman is writing things like:

" ... it’s very possible that creativity and what we think of us as human intelligence are just an emergent property of a small number of algorithms operating with a lot of compute power"

and

"We decry current machine intelligence as cheap tricks, but perhaps our own intelligence is just the emergent combination of a bunch of cheap tricks."

Even Mira Murati's papers point in th

This needed a study? (Score:1)

by nightflameauto ( 6607976 )

I expect we'll see a response from Sam Altman and his ilk within days talking about how reasoning ability is overrated anyway, and the artificial intelligence is superior to supposed "real" intelligence on such a level that we simply aren't equipped to understand the reasoning ability of such a superior creation.

My god, this is stupid. Reasoning ability in LLMs? Just as well say every database in existence has reasoning ability just because you can type a somewhat english looking phrase in (SELECT * FROM $F

Re: (Score:1)

by iAmWaySmarterThanYou ( 10095012 )

Sorry, but you went there. I couldn't resist.

[1]https://xkcd.com/327/ [xkcd.com]

[1] https://xkcd.com/327/

Non deterministic (Score:1)

by cygnusvis ( 6168614 )

The non deterministic nature of AI language models make it impossible to make guarantees and its results cannot be insured financially or legally. For example, If the AI sends a 1 in a million mass email that is highly offensive, the AI producer/maintainer probably has language stating they're not liable.

Re: (Score:3)

by dfghjk ( 711126 )

What "non deterministic nature"? And why are "guarantees" of "results" important?

"For example, If the AI sends a 1 in a million mass email that is highly offensive, the AI producer/maintainer probably has language stating they're not liable."

They'll have that anyway. It's a problem of legal accountability, not a characteristic of LLMs that you cannot accurately describe.

Re: (Score:2)

by omnichad ( 1198475 )

I believe that they are fully deterministic, but generation runs are seeded with random numbers intentionally.

Re: (Score:1)

by iAmWaySmarterThanYou ( 10095012 )

2+2=4. Fully deterministic. Always yields same results given 2+2=? as input.

Vs.

LLM given same user input multiple times yielding different results each time? Non deterministic.

By definition if a random number generator is a key part of your algorithm, it is not deterministic. This should be self evident.

Re: (Score:2)

by omnichad ( 1198475 )

I'll use image generators as an example because even though it's a different algorithm, they work in a lot of the same ways.

You put in a text prompt and get a different image every time, right? No. You can re-run the same prompt with the same seed and get exactly the same picture out of it. You just have to have control over the model to enable that. So maybe not Bing Image Generator but definitely Stable Diffusion.

It's pseudorandom numbers, so yes - it's deterministic.

Re: (Score:2)

by alvinrod ( 889928 )

All computer programs are deterministic unless they use external phenomena to control their execution. Just because they're so big and complex that we can't easily work out their state doesn't mean they've stopped being deterministic.

Re: (Score:1)

by iAmWaySmarterThanYou ( 10095012 )

$x = time();

Print $x

Deterministic?

LLM using time() or rand()... deterministic?

Again? (Score:2)

by ebcdic ( 39948 )

They did the same a couple of days ago:

[1]https://apple.slashdot.org/sto... [slashdot.org]

[1] https://apple.slashdot.org/story/24/10/13/2145256/study-done-by-apple-ai-scientists-proves-llms-have-no-ability-to-reason

Re: (Score:2)

by war4peace ( 1628283 )

Apple is thorough.

Slashdot editors, not so much.

Re: (Score:2)

by bill_mcgonigle ( 4333 ) *

"Hey, LLM, has this article been posted already?"

See, AI could improve /. Maybe it's only as smart as a cat but if that cat can spot dupes that's something editors miss.

Humans use cats to hunt mice too. Not because cats are good at anything else but being mean, but they excel at that. Same with LLM pattern matching.

Apple always shits on tech they're way behind on - until they "revolutionize" it and it's the next best thing. Remember when fanbois were worshiping the Lightning Cable?

They'll snap-to on AI

Not able to reason. (Score:2)

by eriks ( 31863 )

"Generative AI" is simply not capable of what we would universally consider reasoning. LLMs and other "reflexive" pattern-matching systems may be a stepping stone on the way to AGI, or, they may be a cul-de-sac, and won't have anything at all to do with AGI, if such a thing ever comes to be.

I really question this. (Score:2)

by javaman235 ( 461502 )

I mean, take any formal math proof. You have a set of transformations you can make to existing statements, a set of existing statements, and you apply them to get the form you want. All of this is realizable within a neural network, so any output can only be the product of an input plus a transformation.

More Discussion on this from 2 Days Ago (Score:2)

by serutan ( 259622 )

[1]https://apple.slashdot.org/sto... [slashdot.org]

[1] https://apple.slashdot.org/story/24/10/13/2145256/study-done-by-apple-ai-scientists-proves-llms-have-no-ability-to-reason

The critical flaw is that... (Score:2)

by MpVpRb ( 1423381 )

...they have NO reasoning ability

It's all statistics and clever math

I tried it, they are right (Score:1)

by nospam007 ( 722110 ) *

I ask how much is 3+5?

If I change just one character, the '3' to '4', I get a completely different answer.

Novel thought (Score:2)

by fluffernutter ( 1411889 )

When they learn merely by the words that other people have posted, their 'reasoning' can only be a logical calculation within the domain of what other people have said. But 'reasoning' in the way the term is meant means novel thought, and therein lies the rub.

IQ Test (Score:2)

by RossCWilliams ( 5513152 )

Have AI take an IQ test. That's the way we determine "intelligence" in humans. If you want to define it differently you need to come up with a different measure. Or admit you are arguing about an ill-defined term that mostly is used to describe how well someone's thinking conforms to a particular social class.

News: 0175259161

Apple Study Reveals Critical Flaws in AI's Logical Reasoning Abilities

Uh - duh? (Score:2, Redundant)

Re: (Score:3)

Re: (Score:1, Troll)

Re: (Score:2)

Re: Uh - duh? (Score:2)

Re: Uh - duh? (Score:2)

Re: (Score:2)

Reason (Score:1)

Re:Reason (Score:5, Insightful)

Re: (Score:2)

Idiocracy bucket problems (Score:2)

Dupe di-dupe di-dupe di-dupe dupe dupe (Score:3)

Re: (Score:1)

Re: (Score:2)

Re:Dupe di-dupe di-dupe di-dupe dupe dupe (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

This needed a study? (Score:1)

Re: (Score:1)

Non deterministic (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Again? (Score:2)

Re: (Score:2)

Re: (Score:2)

Not able to reason. (Score:2)

I really question this. (Score:2)

More Discussion on this from 2 Days Ago (Score:2)

The critical flaw is that... (Score:2)

I tried it, they are right (Score:1)

Novel thought (Score:2)

IQ Test (Score:2)