90/30 Club (ML reading) #30: Modern OCR: Efficient Recognition in the LLM Era

Name: 90/30 Club (ML reading) #30: Modern OCR: Efficient Recognition in the LLM Era
Start: 2025-11-24T19:00:00.000-08:00
End: 2025-11-24T21:30:00.000-08:00
Location: San Francisco, California

90/30 Club

San Francisco, California

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Week 30: DeepSeek-OCR: Contexts Optical Compression

The Paper Link Here

Modern OCR systems built in the foundation-model era, exemplified by the approach used in DeepSeek’s OCR architecture, reframe text extraction as a unified vision language problem rather than a sequence of detection and recognition modules. Instead of relying on traditional segmentation or rule-based preprocessing, the model uses a high-resolution visual encoder to convert entire document images into dense perceptual embeddings. Cross-attention layers then fuse spatial layout information with learned linguistic priors, enabling robust reading across cluttered pages, screenshots, receipts, handwriting, and multi-column formats.
DeepSeek’s implementation introduces strong innovations in consistency-regularized training and structured decoding. Through heavy augmentation, distortion, compression, blur, rotation, and low-light variants, the model learns feature representations that remain stable under real-world noise. Structural decoding heads allow the system to identify tables, key–value pairs, and irregular layout regions without templates, capturing not only the text itself but its semantic relationships. This marks a shift from character-level transcription toward context-sensitive document understanding.
Empirically, this class of OCR models dramatically outperforms legacy approaches such as Tesseract, CRNN-CTC pipelines, and specialized scene-text engines. DeepSeek reports large gains in multilingual and long-text scenarios, demonstrating strong zero-shot generalization to unseen scripts, stylized fonts, and unconventional document formats. The result is an OCR system that behaves far more like a reader than a scanner, able not just to recognize characters, but to reason about structure, hierarchy, and meaning within complex documents.

Join us at Mox to explore:

- How does treating OCR as a unified vision–language task improve generalization across diverse document types and layouts?

- Do structural decoding heads make OCR models more resilient to adversarial formatting or obfuscated text?

🔎Analyzed Papers

Discussion at 20:00, (optional) quiet reading from 19:00.

Location

Please register to see the exact location of this event.

San Francisco, California

Presented by

90/30 Club

Ilya Sutskever: “If you really learn all of these, you’ll know 90% of what matters today." Reading https://aman.ai/primers/ai/top-30-papers/ week by week!

Hosted By

39 Going