News: 1751375947

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Microsoft Copilot joins ChatGPT at the feet of the mighty Atari 2600 Video Chess

(2025/07/01)


Not content with humiliating ChatGPT at the hands of Video Chess on an Atari 2600 emulator, Robert Caruso has tried again, this time with Microsoft's Copilot.

Theoretically, the result would be the same, and Copilot would take a similar drubbing. Yet... what if Copilot triumphed where [1]ChatGPT could not ? "There's no reason to think it would," [2]wrote Caruso, but... "Imagine everyone's head exploding if a MICROSOFT product outperformed ChatGPT."

So Caruso fired up the Stella emulator and had a pre-game chat with Copilot to explain what tripped up ChatGPT. He told the chatbot that one of the main reasons why ChatGPT lost was that it could not keep track of the board. If Copilot suffered the same difficulty, then there'd be little point in bothering to play.

[3]

With the confidence that only an AI chatbot could muster, Copilot insisted not only could it play chess, but it was also jolly good at it. Caruso said, "It claimed it could think 10–15 moves ahead — but figured it would stick to 3–5 moves against the 2600 because it makes 'suboptimal moves' that it 'could capitalize on... rather than obsess over deep calculations.'"

Humans strike back at Go-playing AI systems [4]READ MORE

And keeping track of the board? Copilot boasted, "I make a strong effort to remember previous moves and maintain continuity in gameplay, so our match should be much smoother."

Copilot admitted to having the same spatial memory gaps as ChatGPT, yet said it could analyze the current board and pick good moves. Caruso would need to give the chatbot a screenshot of the board after the Atari's move and feed Copilot's moves into Video Chess by hand.

[5]

[6]

The game was afoot!

By now, anybody with experience of today's generative AI systems will know what happened. Copilot's hubris was misplaced. Its moves were... interesting, and it managed to lose two pawns, a knight, and a bishop while the mighty Atari 2600 Video Chess was only down a single pawn. Eventually, Caruso asked Copilot to compare what it thought the board looked like with the last screenshot he'd pasted, and the chatbot admitted they were different.

[7]Chap claims Atari 2600 'absolutely wrecked' ChatGPT at chess

[8]Google offered millions to ally itself with trade body fighting Microsoft

[9]CrowdStrike apologizes to Congress for 'perfect storm' that caused global IT outage

[10]Microsoft wasn't CISPE's only suitor – it seems Google was willing to pay for its views on cloudy licensing to prevail

"ChatGPT déjà vu."

There was no way Microsoft's chatbot could win with this handicap. Still, it was gracious in defeat: "Atari's earned the win this round. I'll tip my digital king with dignity and honor [to the] the vintage silicon mastermind that bested me fair and square."

[11]

Caruso's experiment is amusing but also highlights the absolute confidence with which an AI can spout nonsense. Copilot (like ChatGPT) had likely been trained on the fundamentals of chess, but could not create strategies. The problem was compounded by the fact that what it understood the positions on the chessboard to be, versus reality, appeared to be markedly different.

The story's moral has to be: Beware of the confidence of chatbots. LLMs are apparently good at some things. A 45-year-old chess game is clearly not one of them. ®

Get our [12]Tech Resources



[1] https://www.theregister.com/2025/06/09/atari_vs_chatgpt_chess/

[2] https://www.linkedin.com/posts/robert-jr-caruso-23080180_last-episode-the-atari-2600-blew-up-chatgpt-activity-7345180141079175169-wSCk/

[3] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2aGQGFmxZhRsPvfm7FMhgcAAAA0E&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[4] https://www.theregister.com/2023/02/20/human_go_ai_defeat/

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aGQGFmxZhRsPvfm7FMhgcAAAA0E&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33aGQGFmxZhRsPvfm7FMhgcAAAA0E&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[7] https://www.theregister.com/2025/06/09/atari_vs_chatgpt_chess/

[8] https://www.theregister.com/2024/11/28/google_offered_millions_to_cispe/

[9] https://www.theregister.com/2024/09/25/crowdstrike_to_congress_perfect_storm/

[10] https://www.theregister.com/2024/07/16/microsoft_google_cispe/

[11] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44aGQGFmxZhRsPvfm7FMhgcAAAA0E&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[12] https://whitepapers.theregister.com/



wolfetone

To be fair this has made me feel better about myself.

I'm shit at chess, so I'm glad my eventual replacements are equally as shit at it too.

Let's hear it for...

Neil Barnes

Sargon!

Horsey takes King Prawn

cyberdemon

Turns out both ChatGPT and Copilot have the IQ of 10,000 PE teachers?

Re: Horsey takes King Prawn

davefb

APRIL FOOL!

Re: Horsey takes King Prawn

wolfetone

We're talking jape of the decade...

Battle chess

original_rwg

I imagine if there could be a physical manifestation of these two A.I.'s, neither would stand a chance in a game of battle chess. I expect they might show all the physical agility of the robots playing football in this short clip https://www.bbc.co.uk/news/videos/c5ylkyrkjnzo

The future is looking so bright!

Re: Battle chess

Anonymous Coward

The future's so shite, I gotta wear braids.

It reminds me of ASIC chips

Eye Know

You can be a jack of all tasks and a master of none, or a specific piece of software that operates very efficiently.

Reasoning…

Phil Miesle

o4-mini , when challenged, determined it would use python-chess to manage board moves and update state, maintaining FEN string in context.

I suspect a tool-capable reasoning model would beat the 2600 :)

ibmalone

"It claimed it could think 10–15 moves ahead — but figured it would stick to 3–5 moves against the 2600 because it makes 'suboptimal moves' that it 'could capitalize on... rather than obsess over deep calculations.'"

Imagine, if you will, someone on reddit or a comments section (maybe even here), or perhaps usenet (RIP), spouting off about something they have only a passing knowledge of. Now imagine that you're Alan Turing and you're attempting to distinguish between that and what we see above.

As for Copilot and chess, I conducted quite a different experiment recently, as our work 365 subscription now includes it. I asked for a t-shirt design with a particular chess opening on it and a specific text. Obviously it failed, first producing a kind of Etsy-esque view of half a design alongside half a t-shirt with a similar design (the design in question of course not being what I'd asked for).

After managing to refine to just giving me the print image, but getting ever further away from anything resembling a chess board as opposed to an assortment of chess themed images, I asked if it could just give me an image of a chess board in the starting position. There should at least be a good number in the training data right? What I got back is be best described as Howard Staunton's fever dream. The 9x10 board did have 2 rows of chess pieces at each end. In the centre file of which stood a monstrous queen with a spreading crown of spikes, appearing to rise out of the picture they were quite a bit taller than any of the other pieces, including the two kings that flanked each one. For some reason black's pawns were three dimensional while white's laid flat. As you stared closer into it you realised that many squares shaded white into black. Lesser details like the strange hybrid bishops and the half-round, half-square rooks have faded in my memory. I haven't tried it since.

(The knights were surprisingly normal.)

Sicilian defense anyone?

Michael H.F. Wilkinson

I now have this mental image of what might happen if you asked for an image of the Sicilian defense. Some strange Staunton kin/Mafia Don hybrid, perhaps, or mount Etna in the middle of the board

Re: Sicilian defense anyone?

ibmalone

I originally asked for the Vienna opening, but it (disappointingly in hindsight) did not include Mozart or Midge Ure in the response.

Turing Test

Mage

Much misunderstood.

Was it really proposed as evidence of AI, or the idea that a naive human could be fooled by a chat bot? If it's about fooling a naive human, then the proposition has been true since the 1960s, but by programs that have almost no practical value at all. What cruel executive decided "chatbots" could be used for customer support? That should be a crime with a jail term. So much frustration caused to so many.

Re: Turing Test

steelpillow

I think the problem is that too many executives and PE instructors would fail a genuine Turing test. Does make it hard to tell an AI trained on their shit from the originators.

LLMs good at some things.

Mage

Other than boasting, (or advertising copy – is that the same thing?) what are LLMs good for?

Re: LLMs good at some things.

Dan 55

I don't know, but boasting and hubris followed by failure seem a good match for management, perhaps why they think LLMs are great and are trying to foist them on the rest of us.

You make them sound...

Mishak

like an ideal replacement for Trump.

Or has that already happened?

Re: You make them sound...

steelpillow

Judging by Joe Biden's performance, Trump replaced an earlier generation of AI.

gpt-3.5-turbo-instruct is the only LLM that was ever good at chess

FeepingCreature

Nobody knows why, but it seems likely that some chess games snuck their way in the training corpus.

See https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/

gpt-3.5-turbo-instruct is still available: https://platform.openai.com/docs/models/gpt-3.5-turbo?snapshot=gpt-3.5-turbo-instruct

As Microchess is estimated at 1200 elo and Turbo Instruct at 1750 elo, I suspect that would be a better fight. Make sure to use PGN text.

Money will say more in one moment than the most eloquent lover can in years.