Cover Image for vLLM Inference Meetup · Boston
Cover Image for vLLM Inference Meetup · Boston
Avatar for vLLM Meetups and Events
Join the vLLM community to discuss optimizing LLM inference!
61 Going
Registration
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

Deep technical sessions. Live demos. Real conversations.

If you're deploying, or scaling LLM inference, this is the room to be in.

Join Red Hat AI, IBM, NVIDIA, The Open Accelerator, MIT, and the vLLM community in Boston for an evening of technical depth:

  • Hear directly from vLLM maintainers and committers

  • See live demos of real inference workflows

  • Learn how to put your learnings into practice with the Open Accelerator

  • Connect with the engineers and platform teams pushing the state of the art

Program

Optional Pre-Event Workshop

3:30 PM — Doors Open for Workshop Attendees

4:00–5:00 PM — Distributed Inference with llm-d: Your Production-Ready Path to Scalable LLM Inference

About the workshop: llm-d is a distributed inference orchestration layer that reduces tail latency (P95/P99) through intelligent cache-aware routing. In this hands-on workshop, participants deploy Llama 3.1 8B with vLLM, benchmark single-GPU performance, scale to multiple GPUs with naive load balancing, and then use llm-d to demonstrate how cache-aware routing significantly reduces tail latency.

Meetup Agenda

5:00–5:30 PM — Doors Open, Check-In

5:30–5:40 PM — Welcome and Opening Remarks

Saša Zelenović, Sr. Technical Marketing Manager, Red Hat AI

5:40–6:00 PM — Intro to vLLM and Project Update

Michael Goin, vLLM Maintainer and Principal Engineer, Red Hat AI

6:00–6:20 PM — Getting Started with Model Compression for Fast and Efficient Inference

Charles Hernandez, ML Engineer, Red Hat AI

6:20–6:40 PM — Accelerating LLM Inference with Speculative Decoding

Helen Zhao, ML Engineer, Red Hat AI
Fynn Schmitt-Ulms, ML Engineer, Red Hat AI

6:40–7:00 PM — Agentic AI with vLLM

Dhruv Nandakumar, Agent and Inference Engineering, NVIDIA

7:00–7:30 PM — Tackling Distributed Inference at Scale with llm-d and Kubernetes

Carlos Costa, Distinguished Engineer, IBM

7:30–7:40 PM — From Meetup to Hackathon: Building Together in the Open AI Accelerator

Stefanie Chiras, SVP, The Open Accelerator, Red Hat

7:40–8:00 PM — Community Discussion and Q&A

8:00–9:00 PM — Networking, Food and Drinks

Who Should Come

  • vLLM users and contributors

  • ML and infra engineers working on inference and serving

  • Platform teams running GenAI in production

  • Anyone curious about efficient inference across local, cloud, and Kubernetes

Before You Arrive

  • Registration closes 24 hours before the event

  • Unregistered attendees cannot be admitted

  • Bring a photo ID for check-in

See you in Boston! The inference conversation starts here.

Location
314 Main St
Cambridge, MA 02142, USA
The meetup will take place on the 4th floor. Come to the lobby and we'll point you in the right direction.
Avatar for vLLM Meetups and Events
Join the vLLM community to discuss optimizing LLM inference!
61 Going