AWQ:面向端侧 LLM 压缩与加速的激活感知权重量化(Activation-aware Weight Quantization) 📅 2026-01-29 ✍️ 7858 字 ⏱️ 18 min read Quantization Paper
Context Parallel 技术解析 📅 2026-01-27 ✍️ 9054 字 ⏱️ 21 min read Context Parallel Distributed Parallel Long Context Optimization