News: 0176553963

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Inception Emerges From Stealth With a New Type of AI Model

(Wednesday February 26, 2025 @10:30PM (BeauHD) from the new-challenger-emerges dept.)


Inception, a Palo Alto-based AI company founded by Stanford professor Stefano Ermon, claims to have developed a novel diffusion-based large language model (DLM) that [1]significantly outperforms traditional LLMs in speed and efficiency . "Inception's model offers the capabilities of traditional LLMs, including code generation and question-answering, but with significantly faster performance and reduced computing costs, according to the company," reports TechCrunch. From the report:

> Ermon hypothesized generating and modifying large blocks of text in parallel was possible with diffusion models. After years of trying, Ermon and a student of his achieved a major breakthrough, which they detailed in a [2]research paper published last year. Recognizing the advancement's potential, Ermon founded Inception last summer, tapping two former students, UCLA professor Aditya Grover and Cornell professor Volodymyr Kuleshov, to co-lead the company. [...]

>

> "What we found is that our models can leverage the GPUs much more efficiently," Ermon said, referring to the computer chips commonly used to run models in production. "I think this is a big deal. This is going to change the way people build language models." Inception offers an API as well as on-premises and edge device deployment options, support for model fine-tuning, and a suite of out-of-the-box DLMs for various use cases. The company claims its DLMs can run up to 10x faster than traditional LLMs while costing 10x less. "Our 'small' coding model is as good as [OpenAI's] GPT-4o mini while more than 10 times as fast," a company spokesperson told TechCrunch. "Our 'mini' model outperforms small open-source models like [Meta's] Llama 3.1 8B and achieves more than 1,000 tokens per second."



[1] https://techcrunch.com/2025/02/26/inception-emerges-from-stealth-with-a-new-type-of-ai-model/

[2] https://arxiv.org/pdf/2310.16834



10x less?? (Score:3)

by henrik stigell ( 6146516 )

"while costing 10x less"

Was it an AI who wrote that? Stuff doesn't cost "10x less", they cost "90 % less" or "the cost is one tenth" of something else. Duh!

Bah Humbug! (Score:1)

by louzer ( 1006689 )

- Slashdot probably

Re: (Score:3)

by PsychoSlashDot ( 207849 )

> - Slashdot probably

Well, sure. Because so far this AI bubble is mostly unreliable hype.

Image generation models are impressive because there's no "right" and "wrong". There's just "close enough" or "not close enough". But LLMs are exactly that: language models. They're impressive language-parsing tools but their use is often applied to tasks that actually require precision, that's not what they're designed for.

The important next step - I think - is some kind of LFM: Large Fact Model. If we could tokenize facts and tru

Re: (Score:2)

by blue trane ( 110704 )

Was that dress blue, or black again?

Re: (Score:3)

by ceoyoyo ( 59147 )

A generative model is set up to generate a different answer every time you run it. That's the point. You can make non-generative models with language front ends, that's not a problem. The problem with your "fact model" is figuring out what a fact is.

Since both humans and computers are pretty shit at that, I wouldn't hold my breath.

Re: Bah Humbug! (Score:2)

by Big Hairy Gorilla ( 9839972 )

Did you just invent JSON for agent to agent exchange of factoid objects? Data structures?

It's fast, but still limited to basic tasks (Score:3)

by molarmass192 ( 608071 )

I asked it to generate a transformer implementation for DeepSeek R1 and it spits out a whole lot of: // This is a placeholder for the actual implementation

Like other codegen models, it doesn't go much beyond basic common coding tasks. Even for basic tasks in anything but Javascript, the code doesn't compile cleanly.

Negatives aside, it is an interesting thesis, and I like with the direction they're taking. I skimmed the paper, but I think DeepSeek's MoE approach tackles the same weight distribution optimization in a more elegant way. In a nutshell, it's not the CPU or memory that's the limiting factor, it's that attention mechanisms jump around in memory and overloaded the bus IO.

Parallel processing (Score:3)

by Big Hairy Gorilla ( 9839972 )

Somehow this does not seem surprising. Optimizations of some sort were guaranteed to come sooner or later. The anxiety to grab the next thing is palpable lately, this fits right in. Article says they already already have customers lined up...hmmm... its presented as the coming out of a skunkworks, a list of paying customers should be premature but the proof will be in the release.. but you can't help but notice the hype.

Spreading peanut butter reminds me of opera!! I wonder why?