News: 0001466884

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Llamafile 0.8.5 Delivers Greater Performance: Tiny Models 2x Faster On Threadripper

([Programming] 5 Hours Ago Llamafile 0.8.5)


The Mozilla Ocho group has published their newest version of [1]Llamafile , the open-source project that makes it very easy to distribute and run large language models (LLMs) as a single file. Llamafile is an excellent solution for easily sharing and running LLMs and supporting both speedy CPU-based execution as well as GPU acceleration where available.

Llamafile 0.8.5 is the newest version and delivers on yet more performance tuning... On top of the recent work around [2]AVX2 optimizations , [3]more AMD GPU offloading , and other work. Justine Tunney explained of the latest performance work in Llamafile 0.8.5:

"As of #435 the K quants now go consistently 2x faster than llama.cpp upstream. On big CPUs like Threadripper we've doubled the performance of tiny models, for both prompt processing and token generation for tiny models."

Doubling the performance for tiny models on AMD Ryzen Threadripper class hardware!

[4]HP Z6 G5 A with AMD Ryzen Threadripper PRO 7000 series

Llamafile 0.8.5 also delivers faster AVX2 matrix multiplication for MoE models and legacy quants. There are also some AMD Zen 4 performance optimizations, BF16 NVIDIA CUDA support, and other improvements.

Downloads and more details on the Llamafile 0.8.5 release via [5]GitHub . I'll be working on new [6]LLamafile benchmarks soon.



[1] https://www.phoronix.com/search/Llamafile

[2] https://www.phoronix.com/news/Llamafile-0.8.2-More-AVX2

[3] https://www.phoronix.com/news/Llamafile-0.8.1-Released

[4] https://www.phoronix.com/review/hp-z6-g5-a

[5] https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8.5

[6] https://openbenchmarking.org/test/pts/llamafile



Jedibeeftrix

rabcor

pWe00Iri3e7Z9lHOX2Qx

Lycanthropist

hacker, n.:
Originally, any person with a knack for coercing stubborn inanimate
things; hence, a person with a happy knack, later contracted by the
mythical philosopher Frisbee Frobenius to the common usage, 'hack'.
In olden times, upon completion of some particularly atrocious body
of coding that happened to work well, culpable programmers would gather
in a small circle around a first edition of Knuth's Best Volume I by
candlelight, and proceed to get very drunk while sporadically rending
the following ditty:

Hacker's Fight Song

He's a Hack! He's a Hack!
He's a guy with the happy knack!
Never bungles, never shirks,
Always gets his stuff to work!

All take a drink (important!)