Google Unveils Gemini 2.5 Pro, Its Latest AI Reasoning Model With Significant Benchmark Gains (blog.google)
- Reference: 0176814351
- News link: https://tech.slashdot.org/story/25/03/25/195227/google-unveils-gemini-25-pro-its-latest-ai-reasoning-model-with-significant-benchmark-gains
- Source link: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#gemini-2-5-thinking
For developers, Gemini 2.5 Pro demonstrates improved coding abilities with 63.8% on SWE-Bench Verified using a custom agent setup, though this falls short of Anthropic's Claude 3.7 Sonnet score of 70.3%. On Aider Polyglot for code editing, it scores 68.6%, which Google claims surpasses competing models. The reasoning approach builds on Google's previous experiments with reinforcement learning and chain-of-thought prompting. These techniques allow the model to analyze information, incorporate context, and draw conclusions before delivering responses. Gemini 2.5 Pro ships with a 1 million token context window (approximately 750,000 words). The model is available immediately in Google AI Studio and for Gemini Advanced subscribers, with Vertex AI integration planned in the coming weeks.
[1] https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#gemini-2-5-thinking
there is that stupid shit again (Score:2)
"...designed to "think" before responding to queries..."
Literally every piece of software EVER was "designed to think before responding to queries". It is impossible to do otherwise.
I am so sick of this anthropomorphizing of AI. It is computer software.
"...demonstrates enhanced reasoning capabilities across technical tasks."
Does better than some other things at some tasks.
"For developers, Gemini 2.5 Pro demonstrates improved coding abilities ..."
Not to be confused with "coding abilities" of developers.
"Th
Re: (Score:1)
In this case 'reasoning' describes the technique used to improve the LLMs that is different (https://en.wikipedia.org/wiki/Reasoning_language_model). You may disagree with the name, but it isn't just marketing hype. It is what the technique is called in the industry.
Benchmarks are meaningless (Score:2)
Whenever a peddler of LLM-crap stresses their artificial moron is doing better on benchmarks, that just means they have given up and are cheating now.
Re: (Score:2)
Are you still doing this? Move on, man. The world has.
Your contentless ranting is just noise that pollutes Slashdot.
Re: (Score:2)
You never have benchmarks in your life? When putting together your new system, you don't look at how well the various components perform? When hiring for a position, you don't look at their credentials or what they've done? When judging which ar to buy you don't look at its 0-60 times, its fuel mileage, its reliability?
Explain how one is to gauge the good or bad of something without a consistent benchmark to compare against.