

True Serverless Inference with Sub-Second Cold Starts
Hosted by Prashanth Manohar
Registration
Past Event
About Event
Cold starts aren’t solved. They’re just hidden behind pre-warmed GPUs.
We’ll show how we:
restore large models in sub-seconds
run multiple models on a single GPU
use vLLM with InferX’s Snapshot-based runtime.
Live demo + Q&A