Paper

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Majid Hadian

2026.03.27

·Arxiv·by 이호민

#AI#Algorithm#Data Compression#Machine Learning#Vector Quantization

Key Points

1TurboQuant is an online vector quantization method designed to achieve near-optimal distortion rates for both mean-squared error (MSE) and inner product estimation, overcoming limitations of existing techniques.
2It achieves this by randomly rotating input vectors to induce a Beta distribution, enabling optimal scalar quantization per coordinate for MSE, and employs a two-stage approach with 1-bit Quantized JL (QJL) on residuals for unbiased inner product estimation.
3The method provides provably near-optimal distortion bounds (within a factor of ~2.7 of theoretical limits) and demonstrates strong empirical performance in KV cache quantization and nearest neighbor search tasks, outperforming product quantization while reducing indexing time.

\text{TurboQuant}_{\text{mse}}

Paper

Majid Hadian

2026.03.27

·Arxiv·by 이호민

#AI#Algorithm#Data Compression#Machine Learning#Vector Quantization

1TurboQuant is an online vector quantization method designed to achieve near-optimal distortion rates for both mean-squared error (MSE) and inner product estimation, overcoming limitations of existing techniques.
2It achieves this by randomly rotating input vectors to induce a Beta distribution, enabling optimal scalar quantization per coordinate for MSE, and employs a two-stage approach with 1-bit Quantized JL (QJL) on residuals for unbiased inner product estimation.
3The method provides provably near-optimal distortion bounds (within a factor of ~2.7 of theoretical limits) and demonstrates strong empirical performance in KV cache quantization and nearest neighbor search tasks, outperforming product quantization while reducing indexing time.

\text{TurboQuant}_{\text{mse}}