News: 0176963075

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Meta Got Caught Gaming AI Benchmarks

(Tuesday April 08, 2025 @11:04AM (msmash) from the how-about-that dept.)


Meta released [1]two new Llama 4 models over the weekend -- Scout and Maverick -- with claims that Maverick outperforms GPT-4o and Gemini 2.0 Flash on benchmarks. Maverick quickly secured the number-two spot on LMArena, behind only Gemini 2.5 Pro.

Researchers have since discovered that Meta used an "experimental chat version" of Maverick for LMArena testing that was " [2]optimized for conversationality " rather than the publicly available version.

In response, LMArena said "Meta's interpretation of our policy did not match what we expect from model providers" and announced policy updates to prevent similar issues.



[1] https://news.slashdot.org/story/25/04/06/182233/in-milestone-for-open-source-meta-releases-new-benchmark-beating-llama-4-models

[2] https://www.theverge.com/meta/645012/meta-llama-4-maverick-benchmarks-gaming



Is this a surprise? (Score:4, Insightful)

by Snotnose ( 212196 )

Meta isn't exactly a paragon of corporate virtue. More like the swamp the sewers flow into.

Re: (Score:3)

by phantomfive ( 622387 )

Facebook ethics are criminal ethics, and with the current hype environment, with [1]open source benchmarks [slashdot.org] there's a lot of incentive to cheat.

A lot of people say, "That's unethical AND illegal, I'm not going to do it." Facebook says, "That's unethical AND illegal...but when I get caught, can I blame someone else?" They're at a different level.

[1] https://news.slashdot.org/comments.pl?sid=23659251&cid=65288911

Trained on slop (Score:2)

by xack ( 5304745 )

And once again it's humans that save the day by providing custom training. Your new job is cleaning up after AI forever.

Any actual penalties? (Score:4)

by fleeped ( 1945926 )

At the university, if students are caught cheating (to gain advantage over their peers) there are penalties... What penalty/fine does Meta get for this I wonder, since they use these benchmarks to drive investment? Ah yeah, nothing - apparently the greater the stakes the less the accountability in the corporate space.

Re: (Score:3)

by nightflameauto ( 6607976 )

> At the university, if students are caught cheating (to gain advantage over their peers) there are penalties... What penalty/fine does Meta get for this I wonder, since they use these benchmarks to drive investment? Ah yeah, nothing - apparently the greater the stakes the less the accountability in the corporate space.

There was a time, though it now seems long ago, when publicly naming and shaming a company for what should be seen as outrageous gaming schemes would have resulted in a bit of public backlash, and a tendency among investors/purchasers to reconsider further business with the company until it cleaned up its act. Unfortunately, we're now in a time where this type of gamification is actually seen as a positive. "See? They're willing to cheat to win! That means they've got what it takes!" For some reason, we've

Hmm, Only 22% AI Articles This Morning : ) (Score:3, Insightful)

by BrendaEM ( 871664 )

Let's see, seeing that there is still some other Information and technology news in the world. And also seeing that Slashdot has 8 characters, so we would only need to change 1.8 characters to AI. So it would Only be Alashdot, for today.

It is practically impossible to teach good programming style to students
that have had prior exposure to BASIC: as potential programmers they are
mentally mutilated beyond hope of regeneration.
-- Edsger W. Dijkstra, SIGPLAN Notices, Volume 17, Number 5