News: 0001609647

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

Intel Releases LLM-Scaler-vLLM 1.3 With New LLM Model Support

([Intel] 26 Minutes Ago LLM-Scaler-vLLM PV 1.3)


Intel today released the LLM-Scaler-vLLM 1.3 update with expanding the array of large language models that can run on Intel Arc Battlemage graphics cards with this Docker-based stack for deploying vLLM.

The new Intel llm-scaler-vllm 1.3 release via Docker and GitHub adds support for eight new models on capable Intel Arc Graphics hardware: Qwen3-Next-80B-A3B-Instruct, Qwen3-Next-80B-A3B-Thinking, InternVL3.5-30B-A3B, DeepSeek-OCR,PaddleOCR-VL, Seed-OSS-36B-Instruct, Qwen3-30B-A3B-Instruct-2507 and openai/whisper-large-v3.

In addition to those models, there is support for PaddleOCR models and GLM-4.6v-Flash support noted separately. There is also now sym_int4 support now for Qwen3-30B-A3B on TP 4/8 and Qwen3-235B-A22B on TP 16.

The LLM-Scaler-vLLM stack has upgraded against vLLM 0.11.1 and PyTorch 2.9. With the vLLM upgrade they have also enabled CPU KV cache offload, speculative decoding support with two more methods, experimental FP8 KV cache, and other enhancements.

Plus there are more bug fixes and other improvements with Intel LLM-Scaler-vLLM 1.3. Downloads and all the details via [1]GitHub .



[1] https://github.com/intel/llm-scaler/releases/tag/vllm-1.3



"Take that, you hostile sons-of-bitches!"
-- James Coburn, in the finale of _The_President's_Analyst_