Cover Image for [Hands-on Workshop] vLLM Reality Check: Understand, Deploy, Optimize, Automate
Cover Image for [Hands-on Workshop] vLLM Reality Check: Understand, Deploy, Optimize, Automate
Avatar for MLOps Community London
Welcome to the official MLOps Community London chapter 😎 Join us on our #london channel in the MLOps Community Slack: https://mlops.community/join/

[Hands-on Workshop] vLLM Reality Check: Understand, Deploy, Optimize, Automate

Get Tickets
Ticket Price
£150.00
Welcome! To join the event, please get your ticket below.
About Event

What the Workshop Will be About:

This workshop will provide a deep dive into vLLM, a high-performance inference engine for large language models.

Participants will explore how vLLM works under the hood, how it optimizes model execution, and how to effectively deploy and manage it in production environments.


Key Points to Cover:

  • vLLM Internals - Understand the request lifecycle. How vLLM manages model families, and how GPU execution is orchestrated for optimal performance.

  • Transformer Architecture & vLLM Optimizations - Revisit transformer architecture foundations and learn how vLLM leverages techniques like continuous batching, PagedAttention to accelerate inference.

  • Advanced Deployment Strategies - Explore best practices for deploying vLLM across different environments, starting from single-GPU setups.

  • Automated management of inference engines - Discuss what are shortcomings of self serving, why inference providers exist, and test a deployment flow in Cast.ai


Expected Outcomes:

  • Unpack the vLLM deployment struggle

  • Benchmark performance gaps between manual and optimized configurations

  • Discover GPU optimization challenges - availability, selection, and scaling issues

  • See automated deployment solve it - same benchmark, better results

  • You'll be the one running the commands, hitting the walls, and discovering why leading teams automate their AI infrastructure with AI Enabler instead of building it from scratch.


Who it’s for:

AI/ML Engineers, MLOps, LLMOps, DevOps Engineers, and Platform Engineers running or planning to run models in production.


Pre-Workshop Requirements:

A laptop & basic knowledge of Kubernetes and LLMs


Meet Your Conductor:

Igor Šušić, a talented Staff Machine Learning Engineer will be driving the masterclass.


Food, drinks and good vibes will be provided during the workshop.

Location
Hard Rock Cafe
Criterion Building, 225-229 Piccadilly, London W1J 9HR, UK
Avatar for MLOps Community London
Welcome to the official MLOps Community London chapter 😎 Join us on our #london channel in the MLOps Community Slack: https://mlops.community/join/