Google’s Gemini refuses to play Chess against the mighty Atari 2600 after realizing it can't match ancient console

(2025/07/14)

Reference: 1752475607
News link: https://www.theregister.co.uk/2025/07/14/atari_chess_vs_gemini/
Source link:

Google’s Gemini chatbot declined to play Chess against the Atari 2600, after learning the vintage gaming console had already vanquished other AIs.

Robert Caruso, the infrastructure architect who pitted Atari Chess and its feeble hardware against [1]ChatGPT and [2]Microsoft Copilot , told The Register readers have asked him if Google’s Gemini could do any better.

“The question intrigued me because, while ChatGPT and Copilot are cousins built on the same OpenAI base, Gemini is a completely different beast,” he told The Register . “Google built it from the ground up, claiming it’s a game-changer for AI — boasting what it calls a new ‘multimodal’ large language model designed to reason better than its rivals. So I sat it down for a ‘pregame talk’ to see how confident it was feeling.”

[3]

Gemini first told Caruso it would almost certainly dominate Atari Chess “because it is not a mere large language model.”

[4]

[5]

Caruso said the bot told him it is “More akin to a modern chess engine … which can think millions of moves ahead and evaluate endless positions.”

Those boasts came complete with links to stories about Caruso’s past Atari Chess vs. general purpose chatbot matches.

[6]

He responded by informing Gemini he ran those matches, and the AI responded by asking “Did you have any particularly surprising or amusing moments during those matches that stood out to you?”

Caruso told The Register he sent the following response:

What stands out is the misplaced confidence both AIs had. They both predicted easy victories — and now you just said you would dominate the Atari.

Caruso told The Register Gemini then admitted it hallucinated its Chess prowess, and replied with an assessment that it would “struggle immensely against the Atari 2600 Video Chess game engine.”

It then decided “Canceling the match is likely the most time-efficient and sensible decision.”

The simulated Atari 2600 Caruso uses – which replicates its 1.19MhZ processor and mere 128 bytes of RAM – therefore scared off Gemini without moving a pawn, meaning the ancient machine has beaten hordes of GPU-packing monster computers.

[7]Chap claims Atari 2600 'absolutely wrecked' ChatGPT at chess

[8]Microsoft Copilot joins ChatGPT at the feet of the mighty Atari 2600 Video Chess

[9]Tetris Company celebrates classic game's 40th birthday

[10]World's first Neuralink patient enjoying online chess, long Civ 6 sessions

Caruso was impressed by Gemini’s ability to recognize its limitations.

“Adding these reality checks isn’t just about avoiding amusing chess blunders. It’s about making AI more reliable, trustworthy, and safe - especially in critical places where mistakes can have real consequences,” he told The Register . “It’s about ensuring AI stays a powerful tool, not an unchecked oracle.” ®

Get our [11]Tech Resources

[1] https://www.theregister.com/2025/06/09/atari_vs_chatgpt_chess/

[2] https://www.theregister.com/2025/07/01/microsoft_copilot_joins_chatgpt_at/

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aHTVN0Tj65LiUu9wB-g7lQAAAZE&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aHTVN0Tj65LiUu9wB-g7lQAAAZE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aHTVN0Tj65LiUu9wB-g7lQAAAZE&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aHTVN0Tj65LiUu9wB-g7lQAAAZE&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[7] https://www.theregister.com/2025/06/09/atari_vs_chatgpt_chess/

[8] https://www.theregister.com/2025/07/01/microsoft_copilot_joins_chatgpt_at/

[9] https://www.theregister.com/2024/06/06/happy_birthday_tetris/

[10] https://www.theregister.com/2024/03/21/first_human_neuralink_patient/

[11] https://whitepapers.theregister.com/

the Atari 2600

Pascal Monett

Teaching "modern" AI that it is not to be fucked with.

Henry 8

The only winning move is not to play.

Wally Dug

Well done, sir, have a day off.

lglethal

I'm not sure that I would declare an AI that first boasts that it can do something, before backing down from that position, but ONLY after being challenged on it's abilites, as being trustworthy.

Something that's trustworthy would have never made the boasts in the first place. Having to force it to admit it's actually crap, is not something I have to deal with, and in a high pressure environment where it's actions might have real world consequences, a user shouldnt have to challenge it to make sure it can actually do what it claims, before letting it loose.

So no, this is not a step towards a more trustworthy LLM.

stiine

Is that akin to IA saying "Drink gasoline, it will give you a lot of energy." And then saying "I may have been incorrect." after reading your obituary?

Scotech

I don't know... I've met IT consultants who that description would be pretty apt for...

Sam not the Viking

The management consultants I've come across don't deserve the word 'Intelligence', innate or artificial.

Confident and clueless.

Chess?!

Sorry that handle is already taken.

I couldn't even get Gemini to play noughts and crosses without falling in a heap.

Re: Chess?!

Rameses Niblick the Third Kerplunk Kerplunk Whoops Where's My Thribble?

Noughts and crosses is an interesting example. I'd not tried it before, so I did a couple of weeks ago with ChatGPT. After first claiming that squares 2, 4 and 7 made a straight line (they don't, obviously) and then trying to use another square twice, I asked it if it learns from each game it plays. It's response along the lines of "No, each chat is stateless so doesn't impact on others. I could play 1,000 games and not improve."

What was more interesting is that it went on to say that it's training data included examples of 'optimal' play, so it should know better, and offered to play what it called a 'perfect game'. It then went on to make equally bad (but different) choices. It said that even with perfect play it's possible to win if your opponent slips up. I called it out again, saying it shouldn't slip if it was using what it called 'perfect strategy'. Games after that it did do better, but it was a painful process.

Long story short, my experience (and not just with this, I've tried it for a few things) is that to get any LLM to actually do what you want takes so much instruction and cajoling, I may as well have just done it myself. It's like having small children, honestly.

More like...

Mishak

To me, this looks more like having a conversation with a (typically) over-confident AI that makes some assertion which you then refute with "that's not right" - they all then seem to go on to say something along the lines of "oh, my bad - what I meant to say was (what you just told it)".

Re: More like...

MiguelC

Even if it was correct and you were feeding it a false statement... inference machine inferencing from your response? Shocked!

Someone got to Gemini

Acrimonius

Publically it would have severely dented Gemini's reputation so I wonder whether a higher force was at play here and it manipulated the AI response to concede

It’s sulking

Sarah du Heaume

No long-term vision, in-context learning goes out the window.

News: 1752475607