深入 vLLM Pipeline Parallelism:架构、源码与性能取舍 📅 2026-04-03 ✍️ 14286 字 ⏱️ 32 min read Source Code Analysis Distributed Parallel
深入 vLLM EPD:Disaggregated Encoder / Encoder-Prefill/Decode 源码拆解 📅 2026-04-02 ✍️ 12828 字 ⏱️ 29 min read Source Code Analysis
vLLM Model Runner V2 设计文档:从 Persistent Batch、Async-First 到 Triton Native Sampler 📅 2026-03-25 ✍️ 4640 字 ⏱️ 11 min read vLLM