

LLMOps at Scale: Patterns from over 1,365 Production Deployments
Architectural Realities, Context Engineering, and Systems Bottlenecks
Alex Strick van Linschoten (ML Engineer, ZenML) has curated the LLMOps database, a rich and wonderful database that now contains over 1,365 production AI deployments.
In this conversation, Alex and Hugo Bowne-Anderson will discuss the rich findings from the database to identify the architectural patterns that work in high volume environments, across industries and across use cases. They’ll examine case studies from teams like Stripe, ByteDance, and Shopify to understand how they manage the systems engineering requirements of large scale agents.
Topics will include:
How GetOnStack accidentally spent 47,000 dollars in four weeks on a recursive agent loop. Learn how to build hard circuit breakers and kill switches based on P95 metrics to prevent autonomous failures.
Reducing latency from 55 seconds to 1 second. A look at how Robinhood used hierarchical tuning and specialized 8B models to make real time agentic systems viable.
Why teams are ditching Vector DBs for grep and cat. The return of old school engineering in the age of RAG and why a Linux sandbox is often more effective than a complex database.
The Choice Entropy Problem. Why giving an agent more tools and context actually makes it perform worse.
Moving safety from the prompt to the infrastructure. Why session tainting and architectural guardrails are the only way to prevent prompt injection in high stakes financial and medical deployments.
The Agent Harness Problem. Why the software wrapper around the model has become the primary bottleneck in shipping and why teams are refactoring their harnesses five times in six months.
Shadow Mode Deployments. How to run AI in parallel with human experts to build an evaluation baseline before allowing the system to act.
The Reality of MCP. A technical look at the Model Context Protocol one year later. We discuss why it has become the standard for some and a security concern for others.
Just in Time Context. How teams like Elyos AI keep latency low by injecting relevant data only at the exact moment of execution.
Register to join live or get the recording afterwards.