News: 0001598424

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Intel llm-scaler-vllm Beta 1.2 Brings Support For New AI Models On Arc Graphics

([Intel] 31 Minutes Ago llm-scaler-vllm 1.2 beta)


Following yesterday's release of [1]a new llm-scaler-omni beta there is now a new beta feature release of llm-scaler-vllm that provides the Intel-optimized version of vLLM within a Docker container that is set and ready to go for AI on modern Arc Graphics hardware. With today's llm-scaler-vllm 1.2 beta release there is support for a variety of additional large language models (LLMs) and other improvements.

Going the route of llm-scaler-vllm continues to be Intel's preferred choice for customers to leverage vLLM for AI workloads on their discrete graphics hardware. With this new llm-scaler-vllm 1.2 beta release there is support for new models and other enhancements to benefit the Intel vLLM experience:

- Fix 72-hour hang issue

- MoE-Int4 support for Qwen3-30B-A3B

- Bpe-Qwen tokenizer support

- Enable Qwen3-VL Dense/MoE models

- Enable Qwen3-Omni models

- MinerU 2.5 Support

- Enable whisper transcription models

- Fix minicpmv4.5 OOM issue and output error

- Enable ERNIE-4.5-vl models

- Enable Glyph based GLM-4.1V-9B-Base

- Attention kernel optimizations for decoding phases for all workloads (>10% e2e throughput on 10+ models with all in/out seq length)

- Gpt-oss 20B and 120B support in mxfp4 with optimized performance

- MoE models optimizations, output throughput:Qwen3-30B-A3B 2.6x e2e improvement; DeeSeek-V2-lite 1.5x improvement.

- New models: added 8 multi-modality models, image/video are supported.

- vLLM 0.10.2 with new features: P/D disaggregation(experimental), tooling, reasoning output, structured output,

- fp16/bf16 gemm optimizations for batch size 1-128. obvious improvement for small batch sizes.

- Bug fixes

This work will be especially important for next year's [2]Crescent Island hardware release.

More details on the new beta release via [3]GitHub while the llm-scaler-vllm Docker container is available via the Docker Hub container image library.



[1] https://www.phoronix.com/news/Intel-LLM-Scaler-Omni-ComfyUI

[2] https://www.phoronix.com/search/Crescent+Island

[3] https://github.com/intel/llm-scaler/releases/tag/vllm-1.2



"Linux kernel development is dominated by a hacker ethos, in which
external documentation is held in contempt, and even code comments
are viewed with suspicion."

- Jerry Epplin