Cover Image for Hands-On with vLLM: Fast Inference & Model Serving Made Simple

Presented by

Agentversity offers hyper‑personalised AI‑learning paths—combining human mentors and intelligent assistants—so you can build and launch your own production‑ready AI app.

Hosted By

68 Went

AI

Hands-On with vLLM: Fast Inference & Model Serving Made Simple

AgentVersity

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Tired of slow inference and complex serving pipelines? Join us for a live hands-on demo of vLLM, the high-performance inference engine designed for large language models.

In this session, you’ll learn:

How to install and configure vLLM step by step
Best practices for serving models efficiently with dynamic batching and PagedAttention
How vLLM compares to traditional serving frameworks like TGI and Hugging Face Inference
Tips for running vLLM locally and scaling on the cloud

This is a practical, no-fluff workshop—you’ll walk away with a running model served via vLLM and the know-how to deploy your own in production.

🔹 Format: Live coding + Q&A
🔹 Who’s it for: AI engineers, MLEs, founders, and anyone curious about deploying LLMs at scale
🔹 Takeaway: A working vLLM setup and a deeper understanding of efficient LLM serving

Presented by

AgentVersity

Agentversity offers hyper‑personalised AI‑learning paths—combining human mentors and intelligent assistants—so you can build and launch your own production‑ready AI app.

Hosted By

68 Went

AI