Cover Image for 90/30 Club (ML reading) #47: TurboQuant: Near-Optimal Vector Quantization for LLM Memory
Cover Image for 90/30 Club (ML reading) #47: TurboQuant: Near-Optimal Vector Quantization for LLM Memory
Avatar for 90/30 Club
Presented by
90/30 Club
We meet weekly in-person to talk about new ML papers! Come and join the discussion!
56 Went

90/30 Club (ML reading) #47: TurboQuant: Near-Optimal Vector Quantization for LLM Memory

Register to See Address
San Francisco, California
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Week 47: TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Paper Link

TurboQuant proposes a fundamentally simple but surprisingly powerful idea: if you randomly rotate high-dimensional vectors, their coordinates become nearly independent and well-behaved, so you can just quantize each coordinate optimally and still get near-optimal global performance. The result is a data-oblivious, online quantization scheme that achieves distortion rates within a small constant factor of the information-theoretic optimum.

What makes this especially relevant is its application to KV cache compression in large language models. The paper shows that you can push KV cache storage down to ~3–3.5 bits per channel with essentially no quality loss, directly attacking one of the biggest bottlenecks in long-context inference


Join us at Mox to explore:

-Is TurboQuant actually a breakthrough, or is it a clever recombination of classical ideas?

-What matters more in practice: provable near-optimality or engineering simplicity + deployability?

-If KV cache is the real bottleneck for long-context LLMs, does this shift where we should focus optimization (away from weights → toward runtime state)?

🔎Analyzed Papers

​Discussion at 20:00, (optional) quiet reading from 19:00.

Location
Please register to see the exact location of this event.
San Francisco, California
Avatar for 90/30 Club
Presented by
90/30 Club
We meet weekly in-person to talk about new ML papers! Come and join the discussion!
56 Went