

How to scale inference for agents
Every agent interaction triggers 10, 20, 50 LLM calls at the serving layer. Latency stacks across reasoning steps. Reliability drops at compound scale. Most teams are inference blind here. Shipping agents without seeing what's happening underneath.
This roundtable is about seeing things clearly.
Topics we'll cover
Agentic inference economics. What to measure, what to optimize, what to leave alone.
Model routing in production. Large for planning, small for execution, and where it breaks.
Context window explosion across agent steps. KV-cache, summarization, memory architectures
Orchestration at production load. What survives real traffic vs. what works in demos.
Facilitator
Vamshi Ambati · ML Leader, Omniva Neo Cloud (acq Predera), prev VISA. CMU PhD in AI. https://www.linkedin.com/in/vamshiambati/
Host
Thiyagarajan M · PeakInference Forum & Founder, Kalmantic Labs
https://www.linkedin.com/in/thiyagarajan/
Details
Date: Apr 24, 2026
Time: 2 hours + break
Location: San Francisco (shared on confirmation)
Size: 15 participants
Format: Invite-only roundtable.
Who this is for
You're running agents in production. You have engineers working on inference or orchestration. You've hit latency or reliability compounding firsthand.
This won't be useful if you're still prototyping or haven't shipped agents to real users.
To apply
What does your agent architecture look like today?
How many agent runs per day? Average LLM calls per run?
Top 3 inference challenges right now
One thing you want to walk away knowing
Organized by PeakInference Forum. To know more peakinference.org