News: 0001472991

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Llamafile 0.8.7 Brings Fixes, Better ARM Performance & Preps For New Server

([Mozilla] 50 Minutes Ago Llamafile 0.8.7)


[1]Llamafile has been one of the better new initiatives out of Mozilla in recent years. Llamafile makes it easy to [2]conveniently distribute and run large language models as a single file while [3]supporting both CPU and GPU execution and all-around making AI LLMs much more approachable for end-users. Out today is Llamafile 0.8.7 with more performance optimizations and new features.

After recent Llamafile releases have been tuning the Intel/AMD AVX performance, today's Llamafile 0.8.7 release brings some ARM performance improvements. There is better performance on Arm for legacy and K-quants while also bringing optimized matrix multiplication for I-quants on AArch64.

Llamafile 0.8.7 also fixes some AMD GPU issues on Windows by now always using tinyBLAS there, improved CPU brand detection, and other fixes.

Moving forward, a new Llamafile server is preparing to roll-out. Justine Tunney mentioned in the [4]v0.8.7 release announcement on GitHub:

"It should be noted that, in future releases, we plan to introduce a new server for llamafile. This new server is being designed for performance and production-worthiness. It's not included in this release, since the new server currently only supports a tokenization endpoint. However the endpoint is capable of doing 2 million requests per second whereas with the current server, the most we've ever seen is a few thousand."

[5]This patch adding the new Llamafile server notes that it is not only much faster than before but also designed to be crash-proof, reliable, and preempting.

Llamafile continues looking great for easy to distribute and run large language models. Learn more about this open-source project via [6]Llamafile.ai .



[1] https://www.phoronix.com/search/Llamafile

[2] https://www.phoronix.com/news/Llamafile-0.7

[3] https://www.phoronix.com/news/Llamafile-0.8.5-Released

[4] https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8.7

[5] https://github.com/Mozilla-Ocho/llamafile/commit/e0656ea190fa1687712c46641a721b02164e06d0

[6] https://llamafile.ai/



phoronix

If I were a grave-digger or even a hangman, there are some people I could
work for with a great deal of enjoyment.
-- Douglas Jerrold