DeepSeek-V3 Now Runs At 20 Tokens Per Second On Mac Studio
(Tuesday March 25, 2025 @06:20PM (BeauHD)
from the nightmare-for-OpenAI dept.)
- Reference: 0176820317
- News link: https://apple.slashdot.org/story/25/03/25/2054214/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio
- Source link:
An anonymous reader quotes a report from VentureBeat:
> Chinese AI startup DeepSeek has quietly released a new large language model that's already sending ripples through the artificial intelligence industry -- not just for its capabilities, but for how it's being deployed. The 641-gigabyte model, dubbed DeepSeek-V3-0324, appeared on AI repository Hugging Face today with virtually no announcement (just an empty [1]README file ), continuing the company's pattern of low-key but impactful releases. What makes this launch particularly notable is the model's MIT license -- making it freely available for commercial use -- and early reports that it can run directly on consumer-grade hardware, [2]specifically Apple's Mac Studio with M3 Ultra chip .
>
> "The new DeepSeek-V3-0324 in 4-bit runs at > 20 tokens/second on a 512GB M3 Ultra with mlx-lm!" wrote AI researcher Awni Hannun on social media. While the $9,499 Mac Studio might stretch the definition of "consumer hardware," the ability to run such a massive model locally is a major departure from the data center requirements typically associated with state-of-the-art AI. [...] Simon Willison, a developer tools creator, noted in a blog post that a 4-bit quantized version reduces the storage footprint to 352GB, making it feasible to run on high-end consumer hardware like the Mac Studio with M3 Ultra chip. This represents a potentially significant shift in AI deployment. While traditional AI infrastructure typically relies on multiple Nvidia GPUs consuming several kilowatts of power, the Mac Studio draws less than 200 watts during inference. This efficiency gap suggests the AI industry may need to rethink assumptions about infrastructure requirements for top-tier model performance.
"The implications of an advanced open-source reasoning model cannot be overstated," reports VentureBeat. "Current reasoning models like OpenAI's o1 and DeepSeek's R1 represent the cutting edge of AI capabilities, demonstrating unprecedented problem-solving abilities in domains from mathematics to coding. Making this technology freely available would democratize access to AI systems currently limited to those with substantial budgets."
"If DeepSeek-R2 follows the trajectory set by R1, it could present a direct challenge to GPT-5, OpenAI's next flagship model rumored for release in coming months. The contrast between OpenAI's closed, heavily-funded approach and DeepSeek's open, resource-efficient strategy represents two competing visions for AI's future."
[1] https://huggingface.co/deepseek-ai/DeepSeek-V3-0324/tree/main
[2] https://venturebeat.com/ai/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/
> Chinese AI startup DeepSeek has quietly released a new large language model that's already sending ripples through the artificial intelligence industry -- not just for its capabilities, but for how it's being deployed. The 641-gigabyte model, dubbed DeepSeek-V3-0324, appeared on AI repository Hugging Face today with virtually no announcement (just an empty [1]README file ), continuing the company's pattern of low-key but impactful releases. What makes this launch particularly notable is the model's MIT license -- making it freely available for commercial use -- and early reports that it can run directly on consumer-grade hardware, [2]specifically Apple's Mac Studio with M3 Ultra chip .
>
> "The new DeepSeek-V3-0324 in 4-bit runs at > 20 tokens/second on a 512GB M3 Ultra with mlx-lm!" wrote AI researcher Awni Hannun on social media. While the $9,499 Mac Studio might stretch the definition of "consumer hardware," the ability to run such a massive model locally is a major departure from the data center requirements typically associated with state-of-the-art AI. [...] Simon Willison, a developer tools creator, noted in a blog post that a 4-bit quantized version reduces the storage footprint to 352GB, making it feasible to run on high-end consumer hardware like the Mac Studio with M3 Ultra chip. This represents a potentially significant shift in AI deployment. While traditional AI infrastructure typically relies on multiple Nvidia GPUs consuming several kilowatts of power, the Mac Studio draws less than 200 watts during inference. This efficiency gap suggests the AI industry may need to rethink assumptions about infrastructure requirements for top-tier model performance.
"The implications of an advanced open-source reasoning model cannot be overstated," reports VentureBeat. "Current reasoning models like OpenAI's o1 and DeepSeek's R1 represent the cutting edge of AI capabilities, demonstrating unprecedented problem-solving abilities in domains from mathematics to coding. Making this technology freely available would democratize access to AI systems currently limited to those with substantial budgets."
"If DeepSeek-R2 follows the trajectory set by R1, it could present a direct challenge to GPT-5, OpenAI's next flagship model rumored for release in coming months. The contrast between OpenAI's closed, heavily-funded approach and DeepSeek's open, resource-efficient strategy represents two competing visions for AI's future."
[1] https://huggingface.co/deepseek-ai/DeepSeek-V3-0324/tree/main
[2] https://venturebeat.com/ai/deepseek-v3-now-runs-at-20-tokens-per-second-on-mac-studio-and-thats-a-nightmare-for-openai/
Consumer hardware (Score:2)
by OrangeTide ( 124937 )
The Mac 512K had an introductory price of $3,195 (equivalent to $9,670 in 2024). I think the collapse of home computer prices in the 1990's and 2000's has altered what we think a reasonable price for "consumer hardware" is.
66 BAUD (Score:2)
by awwshit ( 6214476 )
Is 66 English characters a second fast?
How would you feel about a 66 BAUD modem?
Beware of Pooh's Bearing gifts (Score:1)
TFA says it seems as powerful as commercial models, yet is "open source".
But it's probably Xi's way to inject and/or vacuum our content and prompts, so it's kind of equivalent to a commercial product.
Re: (Score:2)
That might be true, but wouldn't a code audit reveal such problems?
Re: (Score:1)
By the time audit is done version 4 will be out.
Re: (Score:1)
As Tom wended to school after breakfast, he was the envy of every boy he met because the gap in his upper row of teeth enabled him to expectorate in a new and admirable way. He gathered quite a following of lads interested in the exhibition; and one that had cut his finger and had been a centre of fascination and homage up to this time, now found himself suddenly without an adherent, and shorn of his glory. His heart was heavy, and he said with a disdain which he did not feel, that it wasn't anything to spi