Scaling agent inference in an Agentic Enterprise (NemoClaw)

Name: Scaling agent inference in an Agentic Enterprise (NemoClaw)
Start: 2026-04-24T17:30:00.000-07:00
End: 2026-04-24T20:00:00.000-07:00
Location: San Jose, California

Hosted by Thiyagarajan M & 3 others

Register to See Address

San Jose, California

Registration Closed

This event is not currently taking registrations. You may contact the host or subscribe to receive updates.

About Event

Every agent interaction triggers 10, 20, 50 LLM calls at the serving layer. Latency stacks across reasoning steps. Reliability drops at compound scale. Most teams are inference blind here. Shipping agents without seeing what's happening underneath.

Topics we'll cover

Agentic inference economics. What to measure, what to optimize, what to leave alone.
Model routing in production. Large for planning, small for execution, and where it breaks.
Context window explosion across agent steps. KV-cache, summarization, memory architectures
Orchestration at production load. What survives real traffic vs. what works in demos.

Facilitator

Kashi KS , CAIO & Leader Researcher

https://www.linkedin.com/in/kashiks/

Host

Thiyagarajan M · PeakInference Forum & Founder, Kalmantic Labs

https://www.linkedin.com/in/thiyagarajan/

Details

Date: Apr 24, 2026
Time: 2 hours + break
Location: San Francisco (shared on confirmation)
Size: 15 participants

Who this is for

You're running agents in production. You have engineers working on inference or orchestration. You've hit latency or reliability compounding firsthand.

This won't be useful if you're still prototyping or haven't shipped agents to real users.

To apply

What does your agent architecture look like today?
How many agent runs per day? Average LLM calls per run?
Top 3 inference challenges right now
One thing you want to walk away knowing

Organized by PeakInference Forum. To know more peakinference.org

Location

Please register to see the exact location of this event.

San Jose, California

Hosted By

15 Went

AI