'Forget ChatGPT: Why Researchers Now Run Small AIs On Their Laptops' (nature.com)
- Reference: 0175109411
- News link: https://slashdot.org/story/24/09/23/0452250/forget-chatgpt-why-researchers-now-run-small-ais-on-their-laptops
- Source link: https://www.nature.com/articles/d41586-024-02998-y
> Two more recent trends have blossomed. First, organizations are making 'open weights' versions of LLMs, in which the weights and biases used to train a model are publicly available, so that users can download and run them locally, if they have the computing power. Second, technology firms are making scaled-down versions that can be run on consumer hardware — and that rival the performance of older, larger models. Researchers might use such tools to save money, protect the confidentiality of patients or corporations, or ensure reproducibility... As computers get faster and models become more efficient, people will increasingly have AIs running on their laptops or mobile devices for all but the most intensive needs. Scientists will finally have AI assistants at their fingertips — but the actual algorithms, not just remote access to them.
The article's list of small open-weights models includes Meta's Llama, Google DeepMind's Gemma, Alibaba's Qwen, Apple's DCLM, Mistral's NeMo, and OLMo from the Allen Institute for AI. And then there's Microsoft:
> Although the California tech firm OpenAI hasn't open-weighted its current GPT models, its partner Microsoft in Redmond, Washington, has been on a spree, releasing the small language models Phi-1, Phi-1.5 and Phi-2 in 2023, then four versions of Phi-3 and three versions of Phi-3.5 this year. The Phi-3 and Phi-3.5 models have between 3.8 billion and 14 billion active parameters, and two models (Phi-3-vision and Phi-3.5-vision) handle images1. By some benchmarks, even the smallest Phi model outperforms OpenAI's GPT-3.5 Turbo from 2023, rumoured to have 20 billion parameters... Microsoft used LLMs to write millions of short stories and textbooks in which one thing builds on another. The result of training on this text, says Sébastien Bubeck, Microsoft's vice-president for generative AI, is a model that fits on a mobile phone but has the power of the initial 2022 version of ChatGPT. "If you are able to craft a data set that is very rich in those reasoning tokens, then the signal will be much richer," he says...
>
> Sharon Machlis, a former editor at the website InfoWorld, who lives in Framingham, Massachusetts, wrote [2]a guide to using LLMs locally , covering a dozen options.
The bioinformatician shares another benefit: you don't have to worry about the company updating their models (leading to different outputs). "In most of science, you want things that are reproducible. And it's always a worry if you're not in control of the reproducibility of what you're generating."
And finally, the article reminds readers that "Researchers can build on these tools to create custom applications..."
> Whichever approach you choose, local LLMs should soon be good enough for most applications, says Stephen Hood, who heads open-source AI at the tech firm Mozilla in San Francisco. "The rate of progress on those over the past year has been astounding," he says. As for what those applications might be, that's for users to decide. "Don't be afraid to get your hands dirty," Zakka says. "You might be pleasantly surprised by the results."
[1] https://www.nature.com/articles/d41586-024-02998-y
[2] https://www.infoworld.com/article/2338922/5-easy-ways-to-run-an-llm-locally.html
Open weights (Score:2)
I think the biggest benefit is being able to fine-tune open weight models. You can do additional training and apply the changes as a LoRA (or QLoRA, most likely for consumer hardware). Instead of relying on sleight of hand (are you actually talking to GPT 4.o mini, and if so, have they tacked on additional layers of guardrails since you last sent a query?) you can use checkpointed models, and then tune the specifically to the tasks you want to accomplish.
I think from a performance perspective, it is cheap
Re: Open weights (Score:1)
What do you tune? to what? I think you tune the tone of the language to either mimic your better or tickle your brain better. Either why it is a rather selfish purpose. If you tune to something else - it might be evil or very commercial :-)
Re: (Score:2)
> But from a customization perspective (and data safety perspective) running inference on prem, on hardware you control, is probably less of a deal breaker than running proprietary data through someone else's model, on someone else's hardware.
That is clearly a benefit in that you can use proprietary data and not have it being used by someone else to enhance their model that a competitor can then use as well.
Summary has been a feature in MacOS since X (Score:1)
They have always been quite readable, but I never could convince my self they did not miss out important facts. Is a bigger model better?
Duh (Score:3)
50 years of "This is complex, we could run it remotely!" and then "this is expensive, we could run it locally!"
Re: (Score:2)
This concept needs a catchy name. Lap-cloud? Moist-lap?