How Badly Did ChatGPT and Copilot Fail to Predict the Winners of the Kentucky Derby? (courier-journal.com)

(Sunday May 04, 2025 @03:34AM (EditorDavid) from the wanna-bet? dept.)

In 2016, an online "swarm intelligence" platform stunned horse-racing fans by [1]making a correct prediction for the Kentucky Derby — naming all four top finishers in order. (But the next year its predictions [2]weren't even close , with TechRepublic suggesting 2016's race just had an unusual cluster of obvious picks.)

Since then it's become almost a tradition — asking AI to predict the winning horses each year, then see how close it came. So before today's race, a horse named "Journalism" was given the best odds of winning by professional bookmakers — but could AI make a better prediction? [3] USA Today reports :

> The USA TODAY Network asked Microsoft Copilot AI to simulate the order of finish for the 2025 Kentucky Derby field based on the latest, odds, predictions and race factors on Thursday, May 1. Journalism came out on top in its projection. The AI-generated response cited Journalism's favorable post position (No. 8), which has produced the second-most Kentucky Derby winners and a four-race winning streak that includes last month's Santa Anita Derby.

ChatGPT also picked the exact same horse, [4]according to FanDuel . But in fact, the winning horse turned out to be "Sovereignty" (a horse Copilot predicted would finish second ). Meanwhile Copilot's pick for first place ("Journalism") finished in second.

But after that Copilot's picks were way off...

Copilot's pick for third place was a horse named Rodriguez — which hours later was [5]scratched from the race altogether . (And the next day Copilot's pick for 10th place [6]was also scratched .)

Copilot's pick for fourth place was "Sandman" — who finished in 18th place.

Copilot's pick for fifth place was "Burnham Square" — who finished in 11th place.

Copilot's pick for sixth place was "Luxor Cafe" — who finished in 10th place

Copilot's pick for seventh place was "Render Judgment" — who finished in 16th place...

An online racing publication also [7]asked "a trained AI LLM tool" for their predictions , and received a wildly uneven prediction:

Burnham Square (finished 11th)

Journalism (finished 2nd)

Sandman (finished 18th)

Tiztastic (finished 15th)

Baeza (finished 3rd)

[1] https://news.slashdot.org/story/16/05/10/0130206/swarm-ai-correctly-predicts-kentucky-derby-superfecta-turns-20-into-11000

[2] https://slashdot.org/story/17/05/07/026240/swarm-ai-spectacularly-fails-to-predict-kentucky-derby-winners-a-second-time

[3] https://www.courier-journal.com/story/sports/derby-hq/2025/05/02/kentucky-derby-2025-predictions-ai-picks-winner-results/83392294007/

[4] https://www.fanduel.com/research/2025-kentucky-derby-picks-experts-vs-ai

[5] https://www.bloodhorse.com/horse-racing/articles/284238/rodriguez-scratched-from-kentucky-derby

[6] https://www.nbcsports.com/horse-racing/news/kentucky-derby-field-reduced-to-19-with-scratch-of-grande-leaving-owner-mike-repole-shocked

[7] https://www.twinspires.com/edge/racing/kentucky-derby/experts-vs-ai-revisited-who-will-win-the-2025-kentucky-derby/

Well obviously reality is at fault (Score:2)

by 50000BTU_barbecue ( 588132 )

AI is wonderful, AI is powerful, AI should be everywhere, in everything, all the time.

Re: (Score:2)

by Kernel Kurtz ( 182424 )

> AI is wonderful, AI is powerful, AI should be everywhere, in everything, all the time.

I'm curious if different people ask the same AI the same predictive question at the same time do they all get the same answer every time?

Re: (Score:2)

by e432776 ( 4495975 )

Excellent question! My thought was that these computer programs are not really "knowledge" engines- they are engines that produce a statistically-driven facsimile of a reasonable answer to a question. Though clearly the area is developing very fast and I am not keeping up with everything, I think this is still the case.

Re: (Score:2)

by zekica ( 1953180 )

No. All LLMs have a setting you can tune (if using the API or running locally) or have it set by the provider and not tunable if using the chat interface. This temperature settings changes how the next token is chosen and since both input and output tokens influence the next one the answers might or might not start with the same words but will diverge, so the facts (there is no such thing as facts in LLMs, there are related abstract concepts joining tokens in the latent space) included in the answer will al

Re: (Score:2)

by Mr. Dollar Ton ( 5495648 )

No. It is reliably hallucinating all the time and not only about the future. Ask the "AI" anything that is a series of numbers twice in a slightly different manner and you're practically guaranteed to get quite different answers.

Re: (Score:2)

by martin-boundary ( 547041 )

This [1]video [youtube.com] has a nice visualization by some researchers from Anthropic showing how an LLM attempts to calculate a simple sum 36 + 59.

Starts at 1:51, or you can find their original [2]paper [transformer-circuits.pub] (sorry it's not in PDF).

[1] https://www.youtube.com/watch?v=-wzOetb-D3w

[2] https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-addition

Re: (Score:2)

by nospam007 ( 722110 ) *

It also fails every week to predict the lotto numbers.

Re: (Score:2)

by geekmux ( 1040042 )

> AI is wonderful, AI is powerful, AI should be everywhere, in everything, all the time.

Lets see how well it will excel at leading a Gamblers Anonymous meeting then.

Obvious shitty problem, is obvious.

Re: Well obviously reality is at fault (Score:2)

by vbdasc ( 146051 )

I hate to be the one defending AIs, but these things aren't made to predict basically random events. Anyone who tries to use AIs as gambling assistants is monumentally dumb, even dumber than the AIs.

To these who will say that a horse race isn't random, the AI as a basically statistical machine has no way of knowing the real factors which determine the race's outcome.

How badly did the editor fail to read the winners? (Score:2)

by mcmonkey ( 96054 )

ChatGPT at least has the excuse of making a prediction before the race. The "person" writing TFA got the finishing order wrong. (I assume the submission of actually from AI.)

Sandman finished 7th, not 18th. Burnham Square finished 6th, not 11th.

Re: How badly did the editor fail to read the winn (Score:3)

by 50000BTU_barbecue ( 588132 )

"I assume the submission of actually from AI."

What kind of dressing goes with AI word salad?

Re: How badly did the editor fail to read the win (Score:4, Funny)

by LindleyF ( 9395567 )

Thousand AIsland

Re: How many horses did they put down? (Score:2)

by vbdasc ( 146051 )

Well, at least they're not eaten. Usually. Humans are cruel towards animals. This is the way of humanity.

Re: How many horses did they put down? (Score:2)

by vbdasc ( 146051 )

Forgot to mention that humans are cruel to their fellow humans too... Have you watched a MMA event lately?

Nonsense (Score:1)

by rot16 ( 4603585 )

I don't think AI predictions on horse racing is something an average slashdot reader is interested in. This is as yellow and useless as "tech news" can get.

Re: (Score:2)

by korgitser ( 1809018 )

The Kentucky Derby is Decadent and Depraved?

PLEASE DO POST (Score:1)

by gavron ( 1300111 )

PLEASE DO POST everyone's predictions for some stupid 1800s horse race that were either correct or incorrect.

Start with real people who have real money who gambled.

Then go with real peple who didn't gamble, but IF THEY DID GAMBLE, this is how they'd do it.

Then add ChatpGPT, Copilot, and other glorified spellcheckers.

Seriously, Slashdot on a weekend.

Wake me up when there's somethig new.

News: 0177297545

How Badly Did ChatGPT and Copilot Fail to Predict the Winners of the Kentucky Derby? (courier-journal.com)

Well obviously reality is at fault (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Well obviously reality is at fault (Score:2)

How badly did the editor fail to read the winners? (Score:2)

Re: How badly did the editor fail to read the winn (Score:3)

Re: How badly did the editor fail to read the win (Score:4, Funny)

Re: How many horses did they put down? (Score:2)

Re: How many horses did they put down? (Score:2)

Nonsense (Score:1)

Re: (Score:2)

PLEASE DO POST (Score:1)