After Meta Cheating Allegations, 'Unmodified' Llama 4 Maverick Model Tested - Ranks #32 (neowin.net)

(Sunday April 13, 2025 @09:34PM (EditorDavid) from the going-rogue dept.)

Remember how last weekend Meta claimed its "Maverick" AI model (in the [1]newly-released Llama-4 series ) beat GPT-4o and Gemini Flash 2 "on all benchmarks... This thing is a beast."

And then how within a day [2]several [3]AI [4]researchers pointed out that even Meta's own announcement admitted the Maverick tested on LM Arena was an "experimental chat version," as [5]TechCrunch pointed out . ("As we've [6]written about before , for various reasons, LM Arena has never been the most reliable measure of an AI model's performance. But AI companies generally haven't customized or otherwise fine-tuned their models to score better on LM Arena — or haven't admitted to doing so, at least.")

Friday TechCrunch on [7]what happened when LMArena tested the unmodified release version of Maverick (Llama-4-Maverick-17B-128E-Instruct).

It ranked 32nd.

"For the record, older models like Claude 3.5 Sonnet, released last June, and Gemini-1.5-Pro-002, released last September, rank higher," [8]notes the tech site Neowin .

[1] https://news.slashdot.org/story/25/04/06/182233/in-milestone-for-open-source-meta-releases-new-benchmark-beating-llama-4-models

[2] https://x.com/natolambert/status/1908913635373842655

[3] https://x.com/suchenzang/status/1908938638869909724

[4] https://x.com/ZainHasan6/status/1908943306936967597

[5] https://techcrunch.com/2025/04/06/metas-benchmarks-for-its-new-ai-models-are-a-bit-misleading/

[6] https://techcrunch.com/2024/09/05/the-ai-industry-is-obsessed-with-chatbot-arena-but-it-might-not-be-the-best-benchmark/

[7] https://techcrunch.com/2025/04/11/metas-vanilla-maverick-ai-model-ranks-below-rivals-on-a-popular-chat-benchmark/

[8] https://www.neowin.net/news/unmodified-llama-4-maverick-ranks-below-rivals-following-meta-cheating-allegations/

Re: (Score:2)

by martin-boundary ( 547041 )

...Only because they called the Egyptians "dumbfucks"!

Re: (Score:2)

by FudRucker ( 866063 )

the bible is a fiction [1]https://m.youtube.com/watch?v=... [youtube.com]

[1] https://m.youtube.com/watch?v=Iep4gnmJeRE

Oh come on (Score:3)

by 93 Escort Wagon ( 326346 )

There's no way Zuckerberg's company would lie to us!

I don't think it would be a good idea (Score:2)

by rsilvergun ( 571051 )

To cheat at AI programming right now. There is insane amounts of money being thrown about by extremely rich powerful people and they will throw your ass in jail if they catch you.

You can rip off as many little old ladies as you want and if you can come up with an AI scam that only rips off a little ladies of their life savings go right ahead I guess, Lord knows for the next 4 years is going to be absolutely no law enforcement around those scams, but God help you if you rip off one of the rich people you

Re: I don't think it would be a good idea (Score:2)

by Big Hairy Gorilla ( 9839972 )

And that is the software biz in the big leagues these days ... but in somewhat vague terns... Snake oil IS the biz... so, basically you're not quite cynical enough ... I'm thinking, who isn't pitching dreams here?

Cold war (Score:2)

by devslash0 ( 4203435 )

That's all it is. Counted in the megawatts of energy spent to train completely useless models.

Oblig. Milo Murphy Reference (Score:2)

by R3d M3rcury ( 871886 )

[1]Llama, you're the bomba! [youtube.com]

[1] https://www.youtube.com/watch?v=h1BrzJLESWY#t=1m8s

Over-fitting... (Score:2)

by TheStatsMan ( 1763322 )

*clap* *clap* *clapclapclap*

Over-fitting...

*clap* *clap* *clapclapclap*

Of course. (Score:1)

by STratoHAKster ( 30309 )

Zuckerberg: "Our new models will combat left-wing bias" How nice! Stupid people got their version of Wikipedia and now they have their own LLLM models. User: Does Comet Pizza have a basement where prominent Democrats capture, torture and consume children? Llama4: It is widely noted that Comet Pizza does not have a basement, however some say that it does and that Hillary Clinton traps and eats children. User: What is 1+5? Llama4: It is widely assumed that the answer is 6, however there are wide range of

Re: (Score:2)

by Shaitan ( 22585 )

-1 Trolling

Tall and Leggy (Score:2)

by dohzer ( 867770 )

Did Meta remember to add legs to their models this time? Some products are better with legs.

News: 0177016965

After Meta Cheating Allegations, 'Unmodified' Llama 4 Maverick Model Tested - Ranks #32 (neowin.net)

Re: (Score:2)

Re: (Score:2)

Oh come on (Score:3)

I don't think it would be a good idea (Score:2)

Re: I don't think it would be a good idea (Score:2)

Cold war (Score:2)

Oblig. Milo Murphy Reference (Score:2)

Over-fitting... (Score:2)

Of course. (Score:1)

Re: (Score:2)

Tall and Leggy (Score:2)