News: 0001624365

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

KTransformers Adds AVX2 MoE Support For Viable Performance On CPUs Without AMX/AVX-512

([AI] 6 Hours Ago KTransformers 0.5.3)


KTransformers 0.5.3 released today for this framework for efficient inferencing and fine-tuning of large language models (LLMs) with a focus on CPU-GPU heterogeneous computing. With this release, KTransformers 0.5.3 is now more applicable for CPUs lacking Advanced Matrix Extensions (AMX) and AVX-512 in now providing some AVX2-only kernels too.

KTransformers 0.5.3 introduces AVX2-only inference support for Mixture of Experts "MoE" models. There is AVX2 inference support for BF16, FP8, and GPTQ-INT4 MoE workloads. This is very beneficial for current and recent generation Intel Core (Ultra) processors lacking AVX-512 compared to the latest Xeon servers with AMX and AVX-512 or AMD Zen 4/5 CPUs also having AVX-512. Obviously though going for a CPU with AVX-512 or AMX will yield much greater CPU-based AI inferencing performance.

[1]This pull is what recently introduced the AVX2 inference support for kt-kernel. [2]This new documentation outlines running KTransformers on AVX2 processors for those interested.

KTransformers 0.5.3 also brings NUMA-aware deployment improvements for finer-grained NUMA mapping in multi-socket environments, lower idle CPU overhead, speculative decode enhancements, and various other improvements.

Those interested can find KTransformers 0.5.3 downloads and all the release details over on [3]GitHub .



[1] https://github.com/kvcache-ai/ktransformers/pull/1892

[2] https://github.com/kvcache-ai/ktransformers/blob/9b876754395fd047b6083417949410a39cf60290/doc/en/kt-kernel/AVX2-Tutorial.md

[3] https://github.com/kvcache-ai/ktransformers/releases/tag/v0.5.3



Actual Snippet of Windows Source Code! Honest!

NOTE: The following snippet of the Windows 95 source code was sent to us via
'unofficial' channels. Don't tell anyone you saw this! We really don't
feel like being visited by the Microsoft Intellectual Property Police.

void BusyLoop()
/* Do nothing loop to kill CPU cycles; added at the
request of Intel */
{
DisplayRandomSubliminalMessage();
for( int i = 0; i < BIG_INT; i++ )
for( int j = 0; j < BIG_INT; j++ )
for( int k = 0; k < BIG_INT; k++ )
for( int l = 0; l < BIG_INT; l++ )
if( STACK_SPACE_PERCENTAGE_FREE > .05 )
/* There's plenty of stack space left -- let's
eat up some more CPU cycles, recursively! */
BusyLoop();
}