Intel's New LLM-Scaler Beta Update Brings Whisper Model & GLM-4.5-Air Support
([Intel] 6 Hours Ago
llm-scaler-vllm)
- Reference: 0001571125
- News link: https://www.phoronix.com/news/Intel-llm-scaler-vllm-Whisper
- Source link:
Earlier this month [1]Intel released LLM-Scaler 1.0 as part of their Project Battlematrix initiative. This is a Docker container effort to deliver speedy AI inference performance with multi-GPU scaling and PCIe P2P support and more.
While there was the v1.0 announcement earlier this month, yesterday Intel software engineers released "0.9.0-b3" as a new beta release for the llm-scaler-vllm Docker build.
The updated LLM-Scaler vLLM beta enables Whisper model support, GLM-4.5-Air support, enables GLM-4.1V-9B-Thinking for image input, and enables the dots.ocr model. On top of supporting the additional models, yesterday's beta also optimized vLLM memory usage and enables the pipeline parallelism Ray back-end.
[2]
Downloads and more details on the new Intel LLM-Scaler vLLM release via [3]GitHub .
[1] https://www.phoronix.com/news/Intel-LLM-Scaler-1.0
[2] https://www.phoronix.com/image-viewer.php?id=2025&image=llm_scaler_beta_lrg
[3] https://github.com/intel/llm-scaler/releases/tag/vllm-0.9.0-b3
While there was the v1.0 announcement earlier this month, yesterday Intel software engineers released "0.9.0-b3" as a new beta release for the llm-scaler-vllm Docker build.
The updated LLM-Scaler vLLM beta enables Whisper model support, GLM-4.5-Air support, enables GLM-4.1V-9B-Thinking for image input, and enables the dots.ocr model. On top of supporting the additional models, yesterday's beta also optimized vLLM memory usage and enables the pipeline parallelism Ray back-end.
[2]
Downloads and more details on the new Intel LLM-Scaler vLLM release via [3]GitHub .
[1] https://www.phoronix.com/news/Intel-LLM-Scaler-1.0
[2] https://www.phoronix.com/image-viewer.php?id=2025&image=llm_scaler_beta_lrg
[3] https://github.com/intel/llm-scaler/releases/tag/vllm-0.9.0-b3
phoronix