News: 1773773606

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Mistral boasts code-proofing agent offers champagne performance on a budget bière

(2026/03/17)


Your AI may need AI to oversee its work. Gallic AI biz Mistral is leaning into making AI code generation more reliable with Leanstral, a coding agent for proofs constructed using the open source [1]Lean programming language.

Formal code verification, Mistral [2]argues , reduces the need for human code review, a potentially time-consuming process. Proofs, tests, linting, and specifications can help ground AI code agents in reality so that they produce better output.

Leanstral has been released with open weights (Apache 2.0) as an agent mode within [3]Mistral Vibe , and via a free API endpoint. It is accompanied by results from an as-yet-unreleased benchmark test called FLTEval, designed to evaluate how AI models handle engineering proofs.

[4]

According to Mistral, Leanstral-120B-A6B outperforms larger (more parameters) open source rivals like GLM5-744B-A40B, Kimi-K2.5-1T-32B, and Qwen3.5-397B-A17B on FLTEval.

[5]

[6]

But perhaps more noteworthy is Leanstral's effect on one's bank account.

"Leanstral serves as a high-value alternative to the Claude suite, offering competitive performance at a fraction of the price: Leanstral pass@2 reaches a score of 26.3, beating Sonnet by 2.6 points, while costing only $36 to run, compared to Sonnet's $549," the AI biz claims. "At pass@16, Leanstral reaches a score of 31.9, comfortably beating Sonnet by 8 points."

[7]In the name of science: Boffins build fart-tracking undies

[8]AI still doesn't work very well, businesses are faking it, and a reckoning is coming

[9]Nvidia wraps its NemoClaw around OpenClaw for the sake of security

[10]Former Microsoft dev trains AI to survive the arcade's most chaotic stress test

As for Claude Opus 4.6, Anthropic's premium model at the moment, it scores higher on FLTEeval than Leanstral (39.6 compared to 31.9 for pass@16). But Opus will cost $1,650, compared to $290 Leanstral at 16 passes, or $18 for a single pass at a score of 21.9.

As proof of Leanstral's adept handling of test-driven development, Mistral had the coding agent tackle [11]an actual question from the Proof Assistant Stack Exchange about a bug in some Lean 4 code.

[12]

The company reports that Leanstral successfully built the test code to reproduce the failure and then correctly spotted and fixed the flaw.

Mistral also released [13]Mistral Small 4 , designed as an all-in-one model that can handle reasoning, coding, and instruct/chat tasks, so you don't have to switch between specialized models. ®

Get our [14]Tech Resources



[1] https://lean-lang.org

[2] https://mistral.ai/news/leanstral

[3] https://mistral.ai/products/vibe

[4] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2abndFYlDPRY-MbzqZTRWKgAAAdY&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0

[5] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44abndFYlDPRY-MbzqZTRWKgAAAdY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[6] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33abndFYlDPRY-MbzqZTRWKgAAAdY&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0

[7] https://www.theregister.com/2026/03/17/scientists_build_farttracking_underwear_discover/

[8] https://www.theregister.com/2026/03/17/ai_businesses_faking_it_reckoning_coming_codestrap/

[9] https://www.theregister.com/2026/03/16/nvidia_wraps_its_nemoclaw_around/

[10] https://www.theregister.com/2026/03/16/ai_robotron_dave_plummer/

[11] https://proofassistants.stackexchange.com/questions/6471/a-strange-issue-with-a-type-alias-in-lean4

[12] https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44abndFYlDPRY-MbzqZTRWKgAAAdY&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0

[13] https://mistral.ai/news/mistral-small-4

[14] https://whitepapers.theregister.com/



Time

elsergiovolador

These local models are largely a waste of time. They have no comparison to closed source frontier models. They might look good on benchmarks, but real life use is lacking.

Like that dumb friend who memorised test answers.

Sure you can find use for some narrow case, but you'll probably be better off just resolving the issue algorithmically.

Intelligence required

Claptrap314

Formal methods are great, but they have to be driven by trained intelligence. It is shockingly easy to come up with proofs that are actually vacuous, for instance.

TL;DR: More FakeI Hype.

"Looks clean and obviously correct to me, but then _everything_ I write
always looks obviously correct yo me."

- Linus