News: 0001472991

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Llamafile 0.8.7 Brings Fixes, Better ARM Performance & Preps For New Server

([Mozilla] 50 Minutes Ago Llamafile 0.8.7)


[1]Llamafile has been one of the better new initiatives out of Mozilla in recent years. Llamafile makes it easy to [2]conveniently distribute and run large language models as a single file while [3]supporting both CPU and GPU execution and all-around making AI LLMs much more approachable for end-users. Out today is Llamafile 0.8.7 with more performance optimizations and new features.

After recent Llamafile releases have been tuning the Intel/AMD AVX performance, today's Llamafile 0.8.7 release brings some ARM performance improvements. There is better performance on Arm for legacy and K-quants while also bringing optimized matrix multiplication for I-quants on AArch64.

Llamafile 0.8.7 also fixes some AMD GPU issues on Windows by now always using tinyBLAS there, improved CPU brand detection, and other fixes.

Moving forward, a new Llamafile server is preparing to roll-out. Justine Tunney mentioned in the [4]v0.8.7 release announcement on GitHub:

"It should be noted that, in future releases, we plan to introduce a new server for llamafile. This new server is being designed for performance and production-worthiness. It's not included in this release, since the new server currently only supports a tokenization endpoint. However the endpoint is capable of doing 2 million requests per second whereas with the current server, the most we've ever seen is a few thousand."

[5]This patch adding the new Llamafile server notes that it is not only much faster than before but also designed to be crash-proof, reliable, and preempting.

Llamafile continues looking great for easy to distribute and run large language models. Learn more about this open-source project via [6]Llamafile.ai .



[1] https://www.phoronix.com/search/Llamafile

[2] https://www.phoronix.com/news/Llamafile-0.7

[3] https://www.phoronix.com/news/Llamafile-0.8.5-Released

[4] https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8.7

[5] https://github.com/Mozilla-Ocho/llamafile/commit/e0656ea190fa1687712c46641a721b02164e06d0

[6] https://llamafile.ai/



phoronix

There is a theory which states that if ever anyone discovers exactly what
the Universe is for and why it is here, it will instantly disappear and be
replaced by something even more bizarre and inexplicable. There is another
theory which states that this has already happened.
-- Douglas Adams, "The Hitchhiker's Guide to the Galaxy"