Cover Image for Speculation is all you need: Intro to Speculative Decoding for High Performance Inference
Cover Image for Speculation is all you need: Intro to Speculative Decoding for High Performance Inference
Avatar for Modal
Presented by
Modal
AI infrastructure that developers love
Hosted By

Speculation is all you need: Intro to Speculative Decoding for High Performance Inference

Virtual
Registration
Welcome! To join the event, please register below.
About Event

When it comes to low-latency, high performance inference, speculation is all you need.

Join us for a webinar with Shankha Biswas, who has worked with DFlash and EAGLE3 draft models on Modal's inference optimization team. Learn the basics of speculative decoding while understanding when draft models can help unlock major leaps in inference efficiency.

We'll dive into:

  • Inference performance with and without speculative decoding

  • Training draft models (EAGLE3 → custom speculators)

  • Lessons from Modal’s work with Decagon

  • Resources to get started

Avatar for Modal
Presented by
Modal
AI infrastructure that developers love
Hosted By