Cover Image for Doğaç Eldenk - Attention Drift – What speculative decoding models learn
Cover Image for Doğaç Eldenk - Attention Drift – What speculative decoding models learn
Led by Harsha Nelaturu and Andrej Jovanović. Part of the Cohere Labs Open Science initiative https://cohere.com/research/open-science
Hosted By

Doğaç Eldenk - Attention Drift – What speculative decoding models learn

Google Meet
Registration
Welcome! To join the event, please register below.
About Event

Speculative decoding speeds up LLM inference by drafting tokens with a small model, but drafters degrade sharply under template perturbation and long contexts. We identify a new phenomenon, attention drift: as the drafter generates within a speculation chain, its attention shifts away from the prompt onto its own recent tokens. We trace this to hidden-state magnitude accumulation across drafting steps and fix it with a post-norm architecture—EAGLE 3.1—that improves resilience and performance.

Bio: Doğaç is a Master's student in Northwestern University's Computer Science program, joining Fal as a Machine Learning Engineer. His work focuses on inference acceleration, from speculative decoding to agentic GPU kernel optimization and discovery.

Led by Harsha Nelaturu and Andrej Jovanović. Part of the Cohere Labs Open Science initiative https://cohere.com/research/open-science
Hosted By