

MLOps Reading Group August – Context Rot: How Increasing Input Tokens Impacts LLM Performance
When Bigger Isn’t Always Better: How Context Length Can Break Your LLM
Longer context windows are the new bragging rights in LLMs — now stretching into the millions of tokens. But can models really handle the first and the 10,000th token equally well?
This months paper
Context Rot: How Increasing Input Tokens Impacts LLM Performance
challenges that assumption. Testing 18 top models, including GPT-4.1, Claude 4, Gemini 2.5, and Qwen3, the authors found that even on simple tasks, performance can degrade as inputs grow.
Special guest: Kelly Hong, Chroma researcher and co-author, will join us to discuss the findings and answer questions.
What we’ll cover:
Why common benchmarks like Needle in a Haystack can be misleading
Evidence of “context rot” in semantic retrieval, Q&A, and repeated-word tasks
Implications for real-world uses like agents, summarization, and RAG
Practical tips for designing with long inputs
📅 Date: Thursday, August 28th
🕚 Time: 11 AM ET
💡 Special note: This month is a double feature! In the reading group vote, we couldn’t choose — both top papers got the same number of votes. So, for the first time ever, we’re doing both. The second paper, A Survey of Context Engineering for Large Language Models, is happening on September 4th. Join that session too →
Join the #reading-group channel in the MLOps Community Slack to connect before and after the session. Don’t miss this chance to hear directly from a paper co-author.