Chap claims Atari 2600 'absolutely wrecked' ChatGPT at chess

(2025/06/09)

Reference: 1749450549
News link: https://www.theregister.co.uk/2025/06/09/atari_vs_chatgpt_chess/
Source link:

The Atari 2600 gaming console came into the world in 1977 with an eight-bit processor that ran at 1.19MhZ, and just 128 bytes of RAM – but that’s apparently enough power to beat ChatGPT at chess.

So says infrastructure architect Robert Caruso, who over the weekend [1]posted the results of an experiment he conducted to “pit ChatGPT against the Atari 2600’s chess engine (via Stella emulator) and see what happens.”

ChatGPT confused rooks for bishops, and repeatedly lost track of pieces

Caruso decided to run the experiment after conversing with ChatGPT about the history of chess. At some point in that chat, the bot volunteered to play against the Atari – a reasonable suggestion as “Video Chess” was one of the games Atari commissioned for its console.

Online chat in which chess wonks discuss the merits of Video Chess suggest it may have played at a level beginners may have found challenging, and perhaps gave regular recreational players of intermediate skill a little to worry about.

Caruso thought his experiment would be “a lighthearted stroll down retro memory lane.”

[2]

Instead, he watched as the Atari humiliated ChatGPT.

[3]

[4]

“ChatGPT got absolutely wrecked on the beginner level,” he wrote.

“Despite being given a baseline board layout to identify pieces, ChatGPT confused rooks for bishops, missed pawn forks, and repeatedly lost track of where pieces were.”

[5]Atari 400 makes a comeback in miniature form

[6]Christmas 1984: The last hurrah for 8-bit home computers

[7]Weeks with a BBC Micro? Good enough to fix a mainframe, apparently

[8]The New ROM Antics – building the ZX Spectrum 128

Caruso said ChatGPT blamed the icons Atari chess uses “as too abstract to recognize”. But even after changing to standard chess notation, the chatbot “made enough blunders to get laughed out of a 3rd grade chess club.”

“For 90 minutes, I had to stop it [ChatGPT] from making awful moves and correct its board awareness multiple times per turn,” he wrote. The chatbot “kept promising it would improve ‘if we just started over’.”

[9]

Eventually the bot conceded the game.

“Have you played Atari today?” Caruso asked, invoking the company’s advertising slogan.

“ChatGPT wishes it hadn't,” he concluded.

A challenge to readers

The Register knows many readers enjoy and operate retro-tech devices.

We challenge you to make them fight AI and let us know the results.

[10]

Send news of the results to us [11]here . ®

Get our [12]Tech Resources

[1] https://www.linkedin.com/posts/robert-jr-caruso-23080180_ai-chess-atari2600-activity-7337108175185145856-HSP0

[2] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aEawswsD13qlhmT_QvkQ4QAAABA&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aEawswsD13qlhmT_QvkQ4QAAABA&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aEawswsD13qlhmT_QvkQ4QAAABA&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[5] https://www.theregister.com/2024/01/16/atari_400_makes_a_comeback/

[6] https://www.theregister.com/2024/12/28/christmas_1984_home_computers/

[7] https://www.theregister.com/2025/03/21/on_call/

[8] https://www.theregister.com/2024/01/15/opinion_column_zxspectrum_128/

[9] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aEawswsD13qlhmT_QvkQ4QAAABA&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[10] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aEawswsD13qlhmT_QvkQ4QAAABA&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[11] https://www.theregister.com/Author/Email/Simon-Sharwood

[12] https://whitepapers.theregister.com/

Not surprising

Joe W

The thing is, LLM have no clue of what chess actually is and which moves are actually valid. My kids used to play like that, more like Calvin Ball...

Why?

42656e4d203239

Why would ChatGPT be any good at chess?

Playng chess is not a problem to which general knowledge, or even basic understanding of the rules, gives a great solution.

Try training your AI model on chess game records and you would have, I suspect, a very different outcome.

Re: Why?

Andy Non

I once wrote an entire chess engine which played a half decent game. The tricky bit wasn't encoding the rules of chess and the allowed moves, it was writing algorithms to have strategies to win. It adds a new dimension to the game when you have to approach it from this perspective. The program code has to be highly efficient too as it faces an exponentially increasing amount of processing required with each move ahead it analyses. That is the main limiting factor in its ability to play a good game. It sounds like ChatGPT doesn't even have the rules figured out let alone any strategies. Not sure you could train it on chess game records as there are virtually unlimited combinations of piece layouts.

Re: Why?

LionelB

> It sounds like ChatGPT doesn't even have the rules figured out let alone any strategies.

Of course not. Why would it? That's not how LLMs work. They don't, by design, encode "rules" (or heuristics) in the sense of game-playing strategies.

> Not sure you could train it on chess game records as there are virtually unlimited combinations of piece layouts.

Well, humans manage that - and the most successful human chess players most certainly do train on game records. As do highly successful game-playing programs like Alpha GO (not an LLM).

Re: Why?

that one in the corner

> The tricky bit wasn't encoding the rules of chess and the allowed moves, it was writing algorithms to have strategies to win ... virtually unlimited combinations of piece layouts

Curiously enough, back in the day, that was called "AI research": "Can a computer ever play Chess?". Yes[1]. Then it morphed into "Can a computer ever beat a human at Chess?". Yes[1]. Then "Can a computer ever beat a Grandmaster Human at Chess?." Yes[1]

One way that we can tell that these LLMs are still flailing around trying to actually be useful and cost-effective is that they are still being flogged by the salespeople as marvellous toys that all the cool kids have. If the mechanisms are ever entirely subsumed into the day-to-day "100 Algorithms That Every Effective Programmer Must Know" then we will see that they are actually worthwhile - but by then, guess what[1]/

[1] "It it works, it isn't AI"

Re: Why?

m4r35n357

Playing chess requires intelligence. A1 has none.

Re: Why?

LionelB

> Playing chess requires intelligence.

Apologies for stating the bleeding obvious, but there are (non-LLM) chess programs that can compete at grandmaster level. Are you prepared to call them intelligent?

Chess (and other game) playing occupies at best a small niche in the ecosystem of problems/tasks that require what we might like to label as "intelligence". LLMs are clearly not suitable for that niche - there are other ML systems that fare much better, because that's what they were designed for. Whatever LLMs are designed for (and that's far from clear to me), chess is clearly not included.

Re: Why?

m4r35n357

Only in the sense that they encapsulate the human intelligence that went into their design.

I realize that is a very low bar (too low, I agree), so maybe I can put it a little better:

A1 is less intelligent than 32kb of (non-intelligent) machine code created by a human.

Re: Why?

that one in the corner

> Try training your AI model on chess game records and you would have, I suspect, a very different outcome.

Well, ChatGPT - like the other LLMs - was trained by pulling in everything that could be found on the web, and every other digitised text source, which includes the entire rules of Chess (many times, in book and web form), more books discussing the Great Players and their strategies and, of course, [1]play-by-play records of games. At every level (chess by email, forum post etc).

So it had all the data that you suggest - and yet it failed.

Now, using *sensible* ML techniques on Chess rules *will* create a machine that can play - you can literally even do it with (big enough) matchboxes, in precisely the same way as you can [2]train matchboxes to play tic-tac-toe - although it will take some time for that mechanism to complete its learning phase[1]. But, logically, it will work. What is the MAJOR difference between that ML and ChatGPT? The matchboxes are trained and punished/rewarded against a specific goal and with specific metrics. ChatGPT - nah.

> Why would ChatGPT be any good at chess?

Why would ChatGPT be good at *anything*? Unlike the matchboxes, it has not been trained with any sane goals - it won't be good at designing airplane wings, or baking bread, or writing Python code.

> not a problem to which general knowledge, or even basic understanding of the rules, gives a great solution.

There are NO problems to which general knowledge is the solution - except for winning pub quizzes or beating The Chase[2]. Knowledge plus nous - that'll get the job done. Heck, nous, time and an inquisitive mind will get over the lack of general or even specific knowledge[3].

So what great solution can ChatGPT or its cousins give us?[4]

[1] the Heat Death of The Universe says "hello"

[2] and the prizes you get for either do not compensate for building an entire LLM.

[3] we call this "research"

[4] well, comments on El Reg do say that it works as a super-ELIZA, but there are cheaper ways to achieve that goal.

[1] https://www.365chess.com/tournaments/GCT_Blitz_Croatia_2021_2021/44438#

[2] https://en.wikipedia.org/wiki/Matchbox_Educable_Noughts_and_Crosses_Engine

Re: Why?

LionelB

> Why would ChatGPT be good at *anything*? Unlike the matchboxes, it has not been trained with any sane goals

Well, to be fair, it does have an ostensible "goal": to generate plausibly human-like textual responses to textual prompts. Whether you think that's a sane (or even a useful) goal is another question.

Slippery Slope

Locky

Lets not start asking ChatGPT if it wants to play a game, it's only a matter of time before it gets round to global thermonuclear war

Re: Slippery Slope

b0llchit

But that can be beaten by a simple game of TicTacToe.

Re: Slippery Slope

LionelB

He, he. You might say that current strategic "thinking" around thermonuclear war has all the sophistication of TicTacToe.

Gotham did it first

Maximus Decimus Meridius

Levy Rozman (GothamChess on YouTube) has run 'competitions' between lots of the AI models. All were terrible. Pieces disappearing and reappearing, new pieces materialising from nowhere, illegal moves etc.

I doubt that any could beat the ZX81 1K chess game at this point.

"confused rooks for bishops, missed pawn forks, and repeatedly lost track of where pieces were"

Anonymous Coward

I wonder why that reminded me of a certain President and administration?

eight-bit processor

Anonymous Coward

Curious about which processor that was. According to WikiP it was a 6507 a 28 pin version of the 40 pin dip 6502. It had 12 pins amputated including A13-15 leaving only 13 address lines (8k).

A decent chess program running in 128 bytes of ram and an 8k address space is truly impressive. Beating ChatGPT at anything cleverer than tic-tac-toe† not so impressive I would have thought.

† anyone losing a game of noughts and crosses is probably true manglement material and definitely C suite.

Anonymous Coward

The current iterations of AI have no "memory", in the sense that each prompt is evaluated against the data for that model with no awareness of previous prompts. It's a large learning model, but has no concept of "learning" from past results until data is fed into a new model. So something like a chess game is going to completely fail to get a decent solution since the AI cannot evaluate future moves against past ones.

LionelB

> So something like a chess game is going to completely fail to get a decent solution since the AI cannot evaluate future moves against past ones.

Well, it might conceivably encode strategies for doing that. (It clearly doesn't.)

But was ChatGPT as disappointed..

'bluey

as we were when we opened our birthday presents and our parents had bought us Atari Chess???

"kept promising it would improve ‘if we just started over’."

Dan 55

Possibly, until after about the 5th move where it'll be just as bad all over again.

45RPM

It’s learning. Did it claim at any point that it had won when it actually hadn’t? That was a favourite trick deployed by my sons - and they’re quite reasonable players now. But it did take many years for them to reach their current level.

Grindslow_knoll

Chess is a game with very strict semantic rules, and a punitive branching factor, so good solvers need very efficient depth first-like heuristics, combined with long term forecasting, and responding to an opponent's changing strategy.

In contrast LLMs are assumed to be context sensitive language based, at least in training, on mostly English scraped data where semantics at best are decided on the fly, where no real hard rules apply.

They have some use as improved search engines (with lots of ducktape) or to figure out previous solutions to related queries, but that's about it. Some parts of the hype are essentially Moravec's paradox in reverse, they seem to solve tasks that cost us time as if by magic, but get stumped by problems we consider much harder (theorem proving, combinatoric search).

Most LLMs ironically perform atrociously in context free languages, where they could leverage the rules of the language, instead favoring correlation and massive context windows.

Ask an LLM to get a bibtex entry for a paper, and you get very high hallucination rates.

It is in part why LLms can suggest code that will never compile, let alone make semantic sense, whereas a heuristic with constraints from the language's grammar would never suggest it because it's a search space domain error.

News: 1749450549

Chap claims Atari 2600 'absolutely wrecked' ChatGPT at chess

Not surprising

Why?

Re: Why?

Re: Why?

Re: Why?

Re: Why?

Re: Why?

Re: Why?

Re: Why?

Re: Why?

Slippery Slope

Re: Slippery Slope

Re: Slippery Slope

Gotham did it first

"confused rooks for bishops, missed pawn forks, and repeatedly lost track of where pieces were"

eight-bit processor

But was ChatGPT as disappointed..

"kept promising it would improve ‘if we just started over’."