3:30 PM — Doors Open for Workshop Attendees

4:00–5:00 PM — Distributed Inference with llm-d: Your Production-Ready Path to Scalable LLM Inference

About the workshop: llm-d is a distributed inference orchestration layer that reduces tail latency (P95/P99) through intelligent cache-aware routing. In this hands-on workshop, participants deploy Llama 3.1 8B with vLLM, benchmark single-GPU performance, scale to multiple GPUs with naive load balancing, and then use llm-d to demonstrate how cache-aware routing significantly reduces tail latency.

Meetup Agenda

5:00–5:30 PM — Doors Open, Check-In

5:30–5:40 PM — Welcome and Opening Remarks

Saša Zelenović, Sr. Technical Marketing Manager, Red Hat AI

5:40–6:00 PM — Intro to vLLM and Project Update

Michael Goin, vLLM Maintainer and Principal Engineer, Red Hat AI

6:00–6:20 PM — Getting Started with Model Compression for Fast and Efficient Inference

Charles Hernandez, ML Engineer, Red Hat AI

6:20–6:40 PM — Accelerating LLM Inference with Speculative Decoding

Helen Zhao, ML Engineer, Red Hat AI
Fynn Schmitt-Ulms, ML Engineer, Red Hat AI

6:40–7:00 PM — Dynamic Agent & Inference Optimization with NeMo Agent Toolkit

Dhruv Nandakumar, Agent and Inference Engineering, NVIDIA

7:00–7:30 PM — Tackling Distributed Inference at Scale with llm-d and Kubernetes

Carlos Costa, Distinguished Engineer, IBM

7:30–7:40 PM — From Meetup to Hackathon: Building Together in the Open AI Accelerator

Stefanie Chiras, SVP, The Open Accelerator, Red Hat

7:40–8:00 PM — Community Discussion and Q&A

8:00–9:00 PM — Networking, Food and Drinks

Who Should Come

vLLM users and contributors
ML and infra engineers working on inference and serving
Platform teams running GenAI in production
Anyone curious about efficient inference across local, cloud, and Kubernetes

Before You Arrive

Registration closes 24 hours before the event
Unregistered attendees cannot be admitted
Bring a photo ID for check-in

See you in Boston! The inference conversation starts here.

Location

314 Main St

Cambridge, MA 02142, USA

The meetup will take place on the 4th floor. Come to the lobby and we'll point you in the right direction.

Presented by

vLLM Meetups and Events

Join the vLLM community to discuss optimizing LLM inference!

Hosted By

204 Went

AI

vLLM Inference Meetup · Boston

​Program

​Optional Pre-Event Workshop

​Meetup Agenda

​Who Should Come

​Before You Arrive

Program

Optional Pre-Event Workshop

Meetup Agenda

Who Should Come

Before You Arrive