Cover Image for 90/30 Club (ML reading) #47: TurboQuant: Near-Optimal Vector Quantization for LLM Memory

Presented by

90/30 Club

We meet weekly in-person to talk about new ML papers! Come and join the discussion!

Hosted By

56 Went

AI

90/30 Club (ML reading) #47: TurboQuant: Near-Optimal Vector Quantization for LLM Memory

Name: 90/30 Club (ML reading) #47: TurboQuant: Near-Optimal Vector Quantization for LLM Memory
Start: 2026-04-06T19:00:00.000-07:00
End: 2026-04-06T21:30:00.000-07:00
Location: San Francisco, California

90/30 Club

Register to See Address

San Francisco, California

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Week 47: TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Paper Link
TurboQuant proposes a fundamentally simple but surprisingly powerful idea: if you randomly rotate high-dimensional vectors, their coordinates become nearly independent and well-behaved, so you can just quantize each coordinate optimally and still get near-optimal global performance. The result is a data-oblivious, online quantization scheme that achieves distortion rates within a small constant factor of the information-theoretic optimum.
What makes this especially relevant is its application to KV cache compression in large language models. The paper shows that you can push KV cache storage down to ~3–3.5 bits per channel with essentially no quality loss, directly attacking one of the biggest bottlenecks in long-context inference

Join us at Mox to explore:

-Is TurboQuant actually a breakthrough, or is it a clever recombination of classical ideas?

-What matters more in practice: provable near-optimality or engineering simplicity + deployability?

-If KV cache is the real bottleneck for long-context LLMs, does this shift where we should focus optimization (away from weights → toward runtime state)?

🔎Analyzed Papers

Discussion at 20:00, (optional) quiet reading from 19:00.

Location

Please register to see the exact location of this event.

San Francisco, California

Presented by

90/30 Club

We meet weekly in-person to talk about new ML papers! Come and join the discussion!

Hosted By

56 Went

AI