ChatGPT Just Got 'Absolutely Wrecked' at Chess, Losing to a 1970s-Era Atari 2600 (cnet.com)
- Reference: 0178044625
- News link: https://games.slashdot.org/story/25/06/14/0421247/chatgpt-just-got-absolutely-wrecked-at-chess-losing-to-a-1970s-era-atari-2600
- Source link: https://www.cnet.com/tech/services-and-software/chatgpt-just-got-absolutely-wrecked-at-chess-losing-to-a-1970s-era-atari-2600/
> By using a software emulator to run Atari's [2]1979 game Video Chess , Citrix engineer Robert Caruso said he was able to set up a match between ChatGPT and the 46-year-old game. The matchup did not go well for ChatGPT. "ChatGPT confused rooks for bishops, missed pawn forks and repeatedly lost track of where pieces were — first blaming the Atari icons as too abstract, then faring no better even after switching to standard chess notations," Caruso wrote [3]in a LinkedIn post .
>
> "It made enough blunders to get laughed out of a 3rd-grade chess club," Caruso said. "ChatGPT got absolutely wrecked at the beginner level."
"Caruso wrote that the 90-minute match continued badly and that the AI chatbot repeatedly requested that the match start over..." CNET reports.
"A representative for OpenAI did not immediately return a request for comment."
[1] https://www.cnet.com/tech/services-and-software/chatgpt-just-got-absolutely-wrecked-at-chess-losing-to-a-1970s-era-atari-2600/
[2] https://en.wikipedia.org/wiki/Video_Chess
[3] https://www.linkedin.com/posts/robert-jr-caruso-23080180_ai-chess-atari2600-activity-7337108175185145856-HSP0/
AI (Score:4, Insightful)
This is only news for the kind of people who refer to large language models as "AI".
Unfortunately, that's quite a lot of people.
.
Re: (Score:2)
Stop the vocab fight! It's pointless and useless! Every known definition of "AI" and even "intelligence" has big flaws. I've been in hundreds of such debates, No Human nor Bot Has Ever Proposed A Hole-Free Definition of "Intelligence", so go home and shuddup already!
Re: (Score:1, Interesting)
It isn't that we know and can readily define intelligence in a clear and precise manner.
It is that we know when we're looking at something that clearly isn't intelligent and they call it that anyway.
LLM are clearly not intelligent and it is inappropriate to apply any phrases with the word "intelligence" or variations when describing such systems.
Re: (Score:2)
I guess defining intelligence is like what we used to say about porn. "I know it when I see it" ....tee hee....
Actually, I argue that this is the problem with language. It's vague. Ideas usually start vague, and then only after do you drill down and add details. Like writing pseudo code or specifications for code. This function is called XYZ. It does (blah blah blah...blah blah blah....etc, etc, etc, ad infinitum).
It is hard to be precise. For example how do you define "art". How about "good?". What is "goo
Re: (Score:3)
This is my pet peeve. AI has been turned into a marketing term for things that are not the traditional definition of AI.
The term is now corrupted beyond all hope of recovery.
I'm distressed at how much tools like Chat GPT favor seeming intelligent and capable as an illusion even when lying to you. I've even caught it making a mistake and then blaming me for the mistake or pretending it meant to do it wring as a test step. The conman element is real,. even down to the tool itself.
Re: (Score:2)
That are not the traditional definition of AI.
What IS the traditional definition of AI?
It's been all over the place for years. Back when I was a student, in he very early 2000s, I had a course on AI in the same module as the neural nets lectures. It contained such topics as alpha/beta pruning, A* search, decision trees, expert systems, that kind of thing.
Further in the past neural networks were definitely considered AI, but by 2000 they were considered as "ML" which was generally treated as something separa
Re: (Score:1)
AI: algorithm implemented.
Re: (Score:3)
> This is only news for the kind of people who refer to large language models as "AI".
> Unfortunately, that's quite a lot of people.
> .
Old MacDonald had a LLM farm -
AI, AI, Oh!,
And on that farm he had a nuclear plant,
AI AI Oh!
With a hallucination here, a wrong answer there, here a fault there a fault, everywhere a bad answer.
Old MacDonald had a LLM farm
AI AI Oh!
Re: (Score:2)
> This is only news for the kind of people who refer to large language models as "AI".
So, ... everyone including people working in the field of AI?
Mocking their God (Score:2)
Some people so want to believe that a useful information retrieval system is a superintelligence.
The rest of us aren't surprised that an interesting search engine isn't good at chess.
Re: (Score:3)
> Some people so want to believe that a useful information retrieval system is a superintelligence.
> The rest of us aren't surprised that an interesting search engine isn't good at chess.
That very nicely sums it up. Obviously, you have to be something like a sub-intelligence to think that LLMs are superintelligent. To be fair, something like 80% of the human race cannot fact-check for shit and may well qualify as sub-intelligence. Especially as miost of these do not know about their limitations due to the Dunning-Kruger effect.
Re: (Score:2)
Hmm:
- confused rooks for bishops, missed pawn forks and repeatedly lost track of where pieces were
- first blaming the Atari icons as too abstract, then faring no better even after switching to standard chess notations
- repeatedly requested that the match start over
That all rings a bell somewhere - confusion, blaming everything else for the errors, repeatedly requesting a mulligan. That seems familiar.
An Atari 2600 uses a MOS 6507 with BYTES of RAM (Score:1)
And that beat a state of the art AI? So much for intelligence!
No surprise (Score:3)
To anybody that wants to know, it is already clear that LLMs, including the "reasoning" variant, have zero reasoning abilities. All they can do is statistical predictions based on their training data. Hence any task that requires actual reasoning like chess (because chess is subject to state-space explosion and cannot be solved by "training" alone), is completely out of reach of an LLM.
The only thing surprising to me is that it took so long to come up with demonstrations of this well-known fact. Of course, the usual hallucinators believe (!) that LLMs are thinking machines/God/the singularity and other such crap, but these people are simply delulu and have nothing to contribute except confusing the issue. Refer to the litle pathetic fact that abouy 80% of the human race is "religious" and the scope of _that_ prioblem becomes clear. It also becomes clear why a rather non-impressive technology like LLMs is seen as more than just better search and better crap, when that is essentially all it has delivered. Not worthless, but not a revolution either and the extreme cost of running general (!) LLMs may still kill the whole idea in practice.
Re: (Score:2)
> To anybody that wants to know, it is already clear that LLMs, including the "reasoning" variant, have zero reasoning abilities
A good many humans don't either. They memorize patterns, rituals, slogans, etc. but can't think logically.
Re:No surprise (Score:5, Interesting)
>> To anybody that wants to know, it is already clear that LLMs, including the "reasoning" variant, have zero reasoning abilities
> A good many humans don't either. They memorize patterns, rituals, slogans, etc. but can't think logically.
Indeed. There are a few facts from sociology. Apparently only 10-15% of all humans can fact-check and apparently only around 20% (including the fact-checkers) can be convinced by rational argument when the question matters to them (goes up to 30% when it does not). Unfortunately, these numbers seem to be so well established that there are no current publications I can find. It may also be hard to publish about this. This is from interviews with experts and personal observations and observatioons from friends that also teach on academic levels. ChatGPT at least confirmed the 30% number but sadly failed to find a reference.
Anyway, that would mean only about 10-15% of the human race has active reasoning ability (can come up with rational arguments) and only about 20-30% has passive reasoning ability (can verify rational arguments). And that nicely explains some things, including why so many people mistake generative AI and in particular LLMs for something they are very much not and ascribe capabilities to them that they do not have and cannot have.
Shall we play a game? (Score:2)
Would the average Slashdot reader beat the Atari 2600?
Re: (Score:2)
probably not.
I didn't know about the Atari chess game until a couple weeks ago when an old colleague showed it on FB, that he was struggling with it on the lowest level.
but I did pretty well against Fritz a long time ago, running on a Compaq Armada 7800
Uses poorly suited 4o model (Score:1)
Of course digging into the details you find he used gpt-4o for this, which is years behind frontier models like o3 or Gemini 2.5 which use reasoning to think through their responses and can even write Python as part of this process that likely would compete with the Atari system despite not being designed to do so. AI can be criticized but this ainâ(TM)t it. Itâ(TM)s a whole article written up just to cover some guys LinkedIn post chasing clout and likes. Maybe the reasoning models wouldnâ(TM
But the Atari can't make excuses (Score:2)
...for fucking up like ChatGPT can. Take that Atari!
> [ChatGPT] first blaming the Atari icons as too abstract...continued badly and that the AI chatbot repeatedly requested that the match start over
Score one for Kathe Spracklen (Score:3)
And her algorithm
Who came up with the idea to let a LLM play chess? (Score:2)
A LLM is one of the worst AIs to play chess. I won't be surprised if you're better with some greedy algorithm (which is no good idea in general).
Not all AI are the same. LLM are text generators, not chess players.
ChatGPT is not a chess engine (Score:5, Insightful)
ChatGPT is not a chess engine. Comparing it to an actual chess system is missing the point. The thing that's impressive about systems like ChatGPT is not that they are better than specialized programs, or that it is better than expert humans, but that it is often much better at many tasks than a random human. I'm reasonably confident that if you asked a random person off the street to play chess this way, they'd likely have a similar performance. And it shouldn't be that surprising, since the actual set of text-based training data that corresponds to a lot of legal chess games is going to be a small fraction of the training data, and since nearly identical chess positions can have radically different outcomes, this is precisely the sort of thing that an LLM is bad at (they are really bad at abstract math for similar reasons). This also has a clickbait element given that substantially better LLM AIs than ChatGPT are now out there, including GPT 4o and Claude. Overall, this comes across as people just moving the goalposts while not recognizing how these systems keep getting better and better.
Re:ChatGPT is not a chess engine (Score:5, Insightful)
ChatGPT has flexibility, but it is inferior to both humans and specialized algorithms in nearly all cases.
The main advantage of ChatGPT is that you only have to feed it electricity instead of a living wage.
Re:ChatGPT is not a chess engine (Score:4, Insightful)
> The main advantage of ChatGPT is that you only have to feed it electricity instead of a living wage.
With the little problem that you have to feed it so muich electricity that paying that wage might still well tirn out to be cheaper, even at western standards. At the moment LLMs burn money like crazy and it is unclear whether that can be fixed.
Re: (Score:2)
>> The main advantage of ChatGPT is that you only have to feed it electricity instead of a living wage.
> With the little problem that you have to feed it so muich electricity that paying that wage might still well tirn out to be cheaper, even at western standards. At the moment LLMs burn money like crazy and it is unclear whether that can be fixed.
W're going to be needing several Kashiwazaki-Kariwa size or larger reactor to perform what a web search by a random person can do.
Re: (Score:2)
Remember how expensive electricity from nuclear is? That will not solve things...
Also remember that most Uranium comes from Kazakhstan (43%) and they border on China and Russia. Not a critical dependency you want. Place 2 is Kanada (15%), which the US just has mightily pissed off by sheer leadership stupidity. US domestic? A whopping 0.15%...
Re: (Score:2)
> Remember how expensive electricity from nuclear is? That will not solve things...
> Also remember that most Uranium comes from Kazakhstan (43%) and they border on China and Russia. Not a critical dependency you want. Place 2 is Kanada (15%), which the US just has mightily pissed off by sheer leadership stupidity. US domestic? A whopping 0.15%...
I don't disagree with any of that. And if we do decide to put ourselves in that position, is this glorified search engine going to be worth it? I don't think so.
That said, I think that before too long, we aren't going to need an entire nuclear generating facility to generate power to feed the tech bro wet dream. A guess, but a half educated one, with the way innovation trends to work.
Re: (Score:2, Troll)
I don't think so. Remember you don't have to worry about chat GTP unionizing. In America up until Trump won there was serious effort to develop unions Nationwide. It was the major platform of our previous president. You wouldn't know it because he kind of kept it on the down low in an effort to prevent the upper class from just spending trillions shutting him down.
I don't think the upper class actually saw the threat but I do think Elon musk needed Trump to be president because musk's businesses are fai
Re: (Score:3)
I disagree. Generative AI cannot really do "automation". Far too unreliable. But we will see. Your argument definitely has some merit.
Re: (Score:2)
Businesses often prefer to minimize labor costs even when there's an overall increase to operating costs. Replacing humans with ChatGPT at a 20% markup over labor costs is still going to be an attractive prospect to many MBAs.
Re: (Score:1)
Alternating Current?
Re: (Score:2)
ChatGPT is a language model- and it excels in the production of language. In fact, it's capabilities in that regime are far above that of even 80th percentile humans.
Whoever thought a language model would be remotely good at chess clearly doesn't understand the technology they're working with.
Re: (Score:3)
I wanted to to GP with "Now ask the Atari chess program to summarize a 10-page PDF".
Cherry-picking goes both ways.
Re: (Score:2)
surprisingly good is something different from good.
If you want to play chess, use one of the planning algorithm based engines. They are fast, easy to parallelize, easy to run with a time budget (i.e. they get better the more time you give them, but you can stop them any time and let them do the best move) and actually built to play chess.
Having an LLM play a game is a good way to show they generalize. It is not a good way to build a chess AI.
Many people don't get science. Someone shows they can find Waldo w
Re: (Score:2)
> but it is inferior to both humans and specialized algorithms in nearly all cases.
In what way? The OP postulated pulling a random person off the street - a generalised average person. There's a good chance that they don't even know the basic rules of chess or how to make legal moves. That's the OP's point. ChatGPT is that weird friend of yours who somehow is a pub quiz ace, a true walking encyclopedia, yet someone who has no practical skills.
Re: (Score:2)
"pulling a random person off the street - a generalised average person. There's a good chance that they don't even know the basic rules of chess or how to make legal moves."
It depends where you are. In Russia everyone is taught chess.
(Of course there are no average persons in the street, they are all in Ukraine.)
Re: ChatGPT is not a chess engine (Score:3)
Well, the way I look it is that AI models were trained on unchecked data and they just reheat mistakes made while in training because, statistically, mistakes are more common than good moves.
Garbage in. Garbage out.
Re: (Score:2)
LLM yes. Chess engines are more often trained with methods like self-play.
Re: (Score:2)
Hmm? No. I'm a mathematician. Instead of ad hominem attacks maybe try to address the actual points?
Re: (Score:2)
A lot of the 'headline' announcements, pro and con, are basically useless; but this sort of thing does seem like a useful cautionary tale in the current environment where we've got hype-driven ramming of largely unspecialized LLMs as 'AI features' into basically everything with a sales team; along with a steady drumbeat of reports of things like legal filings with hallucinated references; despite a post-processing layer that just slams your references into a conventional legal search engine to see if they r
Re: (Score:2)
Actually this is a very important result, because it highlights ChatGPT's strength and weakness. It's very good at dredging through vast amounts of text and forming principles of prediction, so that it can fake a human being's speech.
But it doesn't have any intellectual power at all - which is exactly what chess tests.
"On the chessboard, lies and hypocrisy do not survive long. The creative combination lays bare the presumption of a lie; the merciless fact, culminating in the checkmate, contradicts the hypoc
Re: (Score:2)
That LLM AIs are bad at abstract reasoning of this sort is not a new thing. People have seen that very early on with these systems, such as their inability to prove theorems. If someone thought that an LLM would be good at chess by itself in this situation they haven't been paying attention.
Re: (Score:2)
> But it doesn't have any intellectual power at all - which is exactly what chess tests.
All hail the Atari 2600, our intellectual power overlord! Right?
Replace ChatGPT with "autocomplete" (Score:2)
Replace ChatGPT or AI with "autocomplete" and all these AI headlines explains themselves.
Autocomplete loses in Chess!
Autocomplete makes up references!
Autocomplete said something stupid!
Re: (Score:1)
Thee difference is that Chatgpt said it was good at chess.