

vLLM Inference Meetup · Boston
Deep technical sessions. Live demos. Real conversations.
If you're deploying, or scaling LLM inference, this is the room to be in.
Join Red Hat AI, IBM, NVIDIA, The Open Accelerator, MIT, and the vLLM community in Boston for an evening of technical depth:
Hear directly from vLLM maintainers and committers
See live demos of real inference workflows
Learn how to put your learnings into practice with the Open Accelerator
Connect with the engineers and platform teams pushing the state of the art
Program
Optional Pre-Event Workshop
3:30 PM — Doors Open for Workshop Attendees
4:00–5:00 PM — Distributed Inference with llm-d: Your Production-Ready Path to Scalable LLM Inference
About the workshop: llm-d is a distributed inference orchestration layer that reduces tail latency (P95/P99) through intelligent cache-aware routing. In this hands-on workshop, participants deploy Llama 3.1 8B with vLLM, benchmark single-GPU performance, scale to multiple GPUs with naive load balancing, and then use llm-d to demonstrate how cache-aware routing significantly reduces tail latency.
Meetup Agenda
5:00–5:30 PM — Doors Open, Check-In
5:30–5:40 PM — Welcome and Opening Remarks
Saša Zelenović, Sr. Technical Marketing Manager, Red Hat AI
5:40–6:00 PM — Intro to vLLM and Project Update
Michael Goin, vLLM Maintainer and Principal Engineer, Red Hat AI
6:00–6:20 PM — Getting Started with Model Compression for Fast and Efficient Inference
Charles Hernandez, ML Engineer, Red Hat AI
6:20–6:40 PM — Accelerating LLM Inference with Speculative Decoding
Helen Zhao, ML Engineer, Red Hat AI
Fynn Schmitt-Ulms, ML Engineer, Red Hat AI
6:40–7:00 PM — Agentic AI with vLLM
Dhruv Nandakumar, Agent and Inference Engineering, NVIDIA
7:00–7:30 PM — Tackling Distributed Inference at Scale with llm-d and Kubernetes
Carlos Costa, Distinguished Engineer, IBM
7:30–7:40 PM — From Meetup to Hackathon: Building Together in the Open AI Accelerator
Stefanie Chiras, SVP, The Open Accelerator, Red Hat
7:40–8:00 PM — Community Discussion and Q&A
8:00–9:00 PM — Networking, Food and Drinks
Who Should Come
vLLM users and contributors
ML and infra engineers working on inference and serving
Platform teams running GenAI in production
Anyone curious about efficient inference across local, cloud, and Kubernetes
Before You Arrive
Registration closes 24 hours before the event
Unregistered attendees cannot be admitted
Bring a photo ID for check-in
See you in Boston! The inference conversation starts here.