深入 vLLM Pipeline Parallelism:架构、源码与性能取舍 📅 2026-04-03 ✍️ 10281 字 ⏱️ 23 min read Source Code Analysis Distributed Parallel vLLM
深入 vLLM EPD:Disaggregated Encoder / Encoder-Prefill/Decode 源码拆解 📅 2026-04-02 ✍️ 16884 字 ⏱️ 38 min read Source Code Analysis vLLM
vLLM Model Runner V2 设计文档:从 Persistent Batch、Async-First 到 Triton Native Sampler 📅 2026-03-25 ✍️ 4640 字 ⏱️ 11 min read vLLM