Microsoft Copilot Joins ChatGPT At the Feet of the Mighty Atari 2600 Video Chess (theregister.com)
- Reference: 0178273418
- News link: https://slashdot.org/story/25/07/03/2028252/microsoft-copilot-joins-chatgpt-at-the-feet-of-the-mighty-atari-2600-video-chess
- Source link: https://www.theregister.com/2025/07/01/microsoft_copilot_joins_chatgpt_at/
> By now, anybody with experience of today's generative AI systems will know what happened. Copilot's hubris was misplaced. Its moves were... interesting, and it managed to lose two pawns, a knight, and a bishop while the mighty Atari 2600 Video Chess was only down a single pawn. Eventually, Caruso asked Copilot to compare what it thought the board looked like with the last screenshot he'd pasted, and the chatbot admitted they were different. "ChatGPT deja vu."
>
> There was no way Microsoft's chatbot could win with this handicap. Still, it was gracious in defeat: "Atari's earned the win this round. I'll tip my digital king with dignity and honor [to the] the vintage silicon mastermind that bested me fair and square." Caruso's experiment is amusing but also highlights the absolute confidence with which an AI can spout nonsense. Copilot (like ChatGPT) had likely been trained on the fundamentals of chess, but could not create strategies. The problem was compounded by the fact that what it understood the positions on the chessboard to be, versus reality, appeared to be markedly different.
>
> The story's moral has to be: Beware of the confidence of chatbots. LLMs are apparently good at some things. A 45-year-old chess game is clearly not one of them.
[1] https://www.linkedin.com/posts/robert-jr-caruso-23080180_last-episode-the-atari-2600-blew-up-chatgpt-activity-7345180141079175169-wSCk/
[2] https://games.slashdot.org/story/25/06/14/0421247/chatgpt-just-got-absolutely-wrecked-at-chess-losing-to-a-1970s-era-atari-2600
[3] https://www.theregister.com/2025/07/01/microsoft_copilot_joins_chatgpt_at/
2600 chess is better than you think (Score:2)
It is just ludicrously slow. But is a chess player it's surprisingly good if you're willing to be very very patient
Re: (Score:2)
eh, no. I played that for a few rounds and got bored by beating it. I am on Chess.com if you are willing. I am rated around 1750. Not great but I play exciting games... with sacrifices and dramatic checkmates.
Re: (Score:1)
1750 on chess.com is like a god compared to most average players isn't it? I don't think thats a good representation of how most people would fare against the atari.
Re: (Score:2)
humbly, yes, I think 1750 is like a god. I was a child prodigy, and was a chess champion. I have many fond memories.
Re:2600 chess is better than you think (Score:5, Informative)
People elsewhere estimate the Atari 2600 Video Chess to be ~1300 ELO [1]https://www.reddit.com/r/chess... [reddit.com]
[1] https://www.reddit.com/r/chess/comments/cgno9u/how_strong_was_atari_2600s_video_chess/
Re: (Score:2)
back in that day, the programs did not recognize about pawns becoming queens when they hit the eighth square, or did not recognize simple sacrifices to go for a checkmate. I got bored with them quickly.
not newsworthy (Score:2)
this is just click bait.
everyone knows these models are not good at actual gameplay nor is it news that they will confidently mis-state stuff. it wasn't news on the first round it's still not news, and it misses the point that there is a Ton of stuff that humans currently do which the models will do cheaper.
Re: (Score:2)
To pull at a string, I did play ChatGTP for 6 or 7 moves. It did do well. I know it scanned and consumed like.. all of the great Chess games ever played. It can only predict the next word, or move. That seems like the nature of LLM's. If I ever can coax ChatGTP to play a whole chess game.. I will let you know the results.
Re: (Score:2)
I think it's kind of delicious to see chess used as a benchmark of intelligence again. Of course the chatbot could be augmented with a chess engine that it knows how to invoke to easily beat any human. But using an LLM as a chess engine itself is a nice challenge. Maybe there's a way to do it, or maybe AI as we know it needs more visualization capability. Or maybe if the AI can write a good chess engine given the rules.
In any case a single guy trying to prove a negative by failing to do something (set
Re: (Score:2)
I would prefer that an AI Company just goes for General Intelligence. Maybe something self-aware. I sometimes think that I am smarter than and better than other people because I can beat them at chess, but then I hear my mom yelling at me, and possibly feeling her slapping me on my butt, telling me that because I am good at one thing, does not mean that I am "better than", other people.
HVAC Repair... (Score:5, Interesting)
Was diagnosing an HVAC low delta problem on 3 hours of sleep and tried some LLMs as an experiment. They all rang 20 alarm fires, said the compressor is going to explode and absolutely just went all in on catastrophe.
Then I noticed the liquid line had a lower pressure than the suction line.
I reversed the probes.
The things that "AI" misses are outrageous. The language it uses is definitive and it draws on complex topics.
And it misses literally classroom 101 common sense sanity checks.
Re: (Score:2)
I work with insanely high voltages, and with software too. I do research and development. We have to have a firm grasp at what AI is now and what it is not. It does not have common sense, it simply predicts the next word in a sentence. I find that amazing when I am writing software.
I found ChatGDP to be humble. (Score:2)
I tried to play chess with ChatGDP. It constantly said it was not designed to do this. I prodded it and got about 7 moves out of it. It is a chatbot and not a chess player. I know this and it knows this. It did play a great game, after I repeatedly asked if I played this, what would you do? It is not a chess player.
Re: I found ChatGDP to be humble. (Score:3)
As a helpful AI assistant, I cannot do NP complete tasks. I can only confidently pretend to do so.
Re: I found ChatGDP to be humble. (Score:2)
I was going to complain about your typo, but just realized in the context of the US tax spending bill passing the house today it was a clever reference to economic (like health) policy from misused LLMMs.
Bravo. I applaud your subtlety
Re: (Score:2)
It was a Freudian slip. I caught it after I hit post. I thought.... OK.