

Webinar: Scaling Diffusion Models in Production
Diffusion models are in production everywhere now, image generation, video synthesis, avatars, creative tooling, document processing.
But the engineering reality behind those demos is messier than anyone talks about publicly.
Latency spikes. GPU costs that compound with every model update. Pipelines that weren't designed for real user load. Cold starts that quietly kill conversion. Infrastructure that works fine at 100 requests and falls apart at 10,000.
Most teams are solving this in isolation through incidents, over-provisioning, and trial and error.
This webinar changes that.
Join us for a live panel with engineering leaders & researchers who are actively running diffusion workloads in production. No slide decks. No vendor pitches. Just a moderated, practitioner-level conversation about what's actually working - and what isn't.
What we'll cover:
Architecting inference pipelines for spiky, bursty diffusion workloads
GPU cost reality: what optimization actually looks like beyond the theory
Multi-tenancy in generative workloads - isolation, scheduling, fairness
Latency vs. quality tradeoffs and how to communicate them to product teams
Observability for diffusion: the metrics that actually matter
Where diffusion infrastructure is heading as models get heavier
Moderated by Bharatratna Puli, GTM at Simplismart, who works daily with teams scaling inference across clouds, hyperscalers, and data centers.
Panelist:
- Karthik Kumar, Senior AI Researcher, Tavus
- Rahul Deora, AI Lead, Fynd
more panelists to be announced soon
Register now!