

Hands-On with vLLM: Fast Inference & Model Serving Made Simple
Tired of slow inference and complex serving pipelines? Join us for a live hands-on demo of vLLM, the high-performance inference engine designed for large language models.
In this session, you’ll learn:
How to install and configure vLLM step by step
Best practices for serving models efficiently with dynamic batching and PagedAttention
How vLLM compares to traditional serving frameworks like TGI and Hugging Face Inference
Tips for running vLLM locally and scaling on the cloud
This is a practical, no-fluff workshop—you’ll walk away with a running model served via vLLM and the know-how to deploy your own in production.
🔹 Format: Live coding + Q&A
🔹 Who’s it for: AI engineers, MLEs, founders, and anyone curious about deploying LLMs at scale
🔹 Takeaway: A working vLLM setup and a deeper understanding of efficient LLM serving