News: 0001573804

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

ollama 0.11.9 Introducing A Nice CPU/GPU Performance Optimization

([Programming] 3 Hours Ago ollama 0.11.9-rc0)


The ollama open-source software that makes it easy to run AI large language models (LLMs) across different operating systems, hardware, and models is about to enjoy a nice speed boost.

The ollama 0.11.9-rc0 test release was christened a short time ago and comes with a nice performance improvement. This next release of ollama is bringing improved performance by overlapping GPU and CPU computations.

This ollama optimization comes from VMware engineer Daniel Hiltgen and is to build the graph for the next batch asynchronously for helping to keep the GPU busy. Hiltgen explained in [1]the pull request last month:

"This refactors the main run loop of the ollama runner to perform the main GPU intensive tasks (Compute+Floats) in a go routine so we can prepare the next batch in parallel to reduce the amount of time the GPU stalls waiting for the next batch of work.

On metal, I see a 2-3% speedup in token rate. On a single RTX 4090 I see a ~7% speedup."

Around 7% better performance on a NVIDIA GeForce RTX 4090 is significant and other higher-end GPU models should experience nice gains too from this improvement for helping to keep the GPU sustained with work.

The ollama 0.11.9-rc0 release also fixes issues where unrecognized AMD GPUs would cause an error for ollama. Plus some crash fixes due to unhandled errors in some Mac and Linux ollama installations.

Downloads and more details on this ollama test release via [2]GitHub .



[1] https://github.com/ollama/ollama/pull/11863

[2] https://github.com/ollama/ollama/releases/tag/v0.11.9-rc0



markg85

creoflux

Michael

Ironmask

Do not disturb.