News: 0001624365

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

KTransformers Adds AVX2 MoE Support For Viable Performance On CPUs Without AMX/AVX-512

([AI] 6 Hours Ago KTransformers 0.5.3)


KTransformers 0.5.3 released today for this framework for efficient inferencing and fine-tuning of large language models (LLMs) with a focus on CPU-GPU heterogeneous computing. With this release, KTransformers 0.5.3 is now more applicable for CPUs lacking Advanced Matrix Extensions (AMX) and AVX-512 in now providing some AVX2-only kernels too.

KTransformers 0.5.3 introduces AVX2-only inference support for Mixture of Experts "MoE" models. There is AVX2 inference support for BF16, FP8, and GPTQ-INT4 MoE workloads. This is very beneficial for current and recent generation Intel Core (Ultra) processors lacking AVX-512 compared to the latest Xeon servers with AMX and AVX-512 or AMD Zen 4/5 CPUs also having AVX-512. Obviously though going for a CPU with AVX-512 or AMX will yield much greater CPU-based AI inferencing performance.

[1]This pull is what recently introduced the AVX2 inference support for kt-kernel. [2]This new documentation outlines running KTransformers on AVX2 processors for those interested.

KTransformers 0.5.3 also brings NUMA-aware deployment improvements for finer-grained NUMA mapping in multi-socket environments, lower idle CPU overhead, speculative decode enhancements, and various other improvements.

Those interested can find KTransformers 0.5.3 downloads and all the release details over on [3]GitHub .



[1] https://github.com/kvcache-ai/ktransformers/pull/1892

[2] https://github.com/kvcache-ai/ktransformers/blob/9b876754395fd047b6083417949410a39cf60290/doc/en/kt-kernel/AVX2-Tutorial.md

[3] https://github.com/kvcache-ai/ktransformers/releases/tag/v0.5.3



And if you wonder,
What I am doing,
As I am heading for the sink.
I am spitting out all the bitterness,
Along with half of my last drink.