News

#vllm #llm #ai #opensource | vLLM

vLLM

2026.01.21

·LinkedIn·by 이호민

#vLLM#LLM#AI#OpenSource

Key Points

1vLLM v0.14.0 introduces significant changes including default async scheduling, a PyTorch 2.9.1 requirement, and removal of deprecated quantization, while adding a gRPC server and `--max-model-len auto` for efficient GPU memory usage.
2The release expands model compatibility to include Grok-2 and various multimodal architectures, alongside MoE LoRA support for models like LLaVA, and enhances performance with CUTLASS MoE optimizations.
3Hardware support is updated for SM103 and B300 Blackwell, with new large-scale serving features like Extended Dual-Batch Overlap (XBO) and NIXL asymmetric TP to improve efficiency.

VLLM_LOG_MODEL_INSPECTION=1

News

vLLM

2026.01.21

·LinkedIn·by 이호민

#vLLM#LLM#AI#OpenSource

1vLLM v0.14.0 introduces significant changes including default async scheduling, a PyTorch 2.9.1 requirement, and removal of deprecated quantization, while adding a gRPC server and `--max-model-len auto` for efficient GPU memory usage.
2The release expands model compatibility to include Grok-2 and various multimodal architectures, alongside MoE LoRA support for models like LLaVA, and enhances performance with CUTLASS MoE optimizations.
3Hardware support is updated for SM103 and B300 Blackwell, with new large-scale serving features like Extended Dual-Batch Overlap (XBO) and NIXL asymmetric TP to improve efficiency.

VLLM_LOG_MODEL_INSPECTION=1