Cover Image for Hands-On with vLLM: Fast Inference & Model Serving Made Simple
Cover Image for Hands-On with vLLM: Fast Inference & Model Serving Made Simple
Avatar for AgentVersity
Presented by
AgentVersity
Hosted By
68 Went

Hands-On with vLLM: Fast Inference & Model Serving Made Simple

Zoom
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Tired of slow inference and complex serving pipelines? Join us for a live hands-on demo of vLLM, the high-performance inference engine designed for large language models.

In this session, you’ll learn:

  • How to install and configure vLLM step by step

  • Best practices for serving models efficiently with dynamic batching and PagedAttention

  • How vLLM compares to traditional serving frameworks like TGI and Hugging Face Inference

  • Tips for running vLLM locally and scaling on the cloud

This is a practical, no-fluff workshop—you’ll walk away with a running model served via vLLM and the know-how to deploy your own in production.

🔹 Format: Live coding + Q&A
🔹 Who’s it for: AI engineers, MLEs, founders, and anyone curious about deploying LLMs at scale
🔹 Takeaway: A working vLLM setup and a deeper understanding of efficient LLM serving

Avatar for AgentVersity
Presented by
AgentVersity
Hosted By
68 Went