Meta Got Caught Gaming AI Benchmarks
- Reference: 0176963075
- News link: https://tech.slashdot.org/story/25/04/08/133257/meta-got-caught-gaming-ai-benchmarks
- Source link:
Researchers have since discovered that Meta used an "experimental chat version" of Maverick for LMArena testing that was " [2]optimized for conversationality " rather than the publicly available version.
In response, LMArena said "Meta's interpretation of our policy did not match what we expect from model providers" and announced policy updates to prevent similar issues.
[1] https://news.slashdot.org/story/25/04/06/182233/in-milestone-for-open-source-meta-releases-new-benchmark-beating-llama-4-models
[2] https://www.theverge.com/meta/645012/meta-llama-4-maverick-benchmarks-gaming
Trained on slop (Score:2)
And once again it's humans that save the day by providing custom training. Your new job is cleaning up after AI forever.
Any actual penalties? (Score:4)
At the university, if students are caught cheating (to gain advantage over their peers) there are penalties... What penalty/fine does Meta get for this I wonder, since they use these benchmarks to drive investment? Ah yeah, nothing - apparently the greater the stakes the less the accountability in the corporate space.
Re: (Score:3)
> At the university, if students are caught cheating (to gain advantage over their peers) there are penalties... What penalty/fine does Meta get for this I wonder, since they use these benchmarks to drive investment? Ah yeah, nothing - apparently the greater the stakes the less the accountability in the corporate space.
There was a time, though it now seems long ago, when publicly naming and shaming a company for what should be seen as outrageous gaming schemes would have resulted in a bit of public backlash, and a tendency among investors/purchasers to reconsider further business with the company until it cleaned up its act. Unfortunately, we're now in a time where this type of gamification is actually seen as a positive. "See? They're willing to cheat to win! That means they've got what it takes!" For some reason, we've
Hmm, Only 22% AI Articles This Morning : ) (Score:3, Insightful)
Let's see, seeing that there is still some other Information and technology news in the world. And also seeing that Slashdot has 8 characters, so we would only need to change 1.8 characters to AI. So it would Only be Alashdot, for today.
Is this a surprise? (Score:4, Insightful)
Meta isn't exactly a paragon of corporate virtue. More like the swamp the sewers flow into.
Re: (Score:3)
Facebook ethics are criminal ethics, and with the current hype environment, with [1]open source benchmarks [slashdot.org] there's a lot of incentive to cheat.
A lot of people say, "That's unethical AND illegal, I'm not going to do it." Facebook says, "That's unethical AND illegal...but when I get caught, can I blame someone else?" They're at a different level.
[1] https://news.slashdot.org/comments.pl?sid=23659251&cid=65288911