Cover Image for Open Source Distributed AI Inference (llm-d/vLLM) Meetup

Presented by

vLLM Meetups and Events

Join the vLLM community to discuss optimizing LLM inference!

Hosted By

42 Going

AI

Open Source Distributed AI Inference (llm-d/vLLM) Meetup

Name: Open Source Distributed AI Inference (llm-d/vLLM) Meetup
Start: 2026-05-28T17:00:00.000-04:00
End: 2026-05-28T20:30:00.000-04:00
Location: Cambridge, MA

vLLM Meetups and Events

Register to See Address

Cambridge, MA

Approval Required

Your registration is subject to host approval.

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

Hosted by Google Cloud, Red Hat AI, and the llm-d Community

Date: Thursday, May 28th 2026

Event Overview

Join us at Google’s Cambridge office for a deep dive into the latest advancements in open-source distributed inference. This session will focus on the evolution of llm-d, ranging from the upcoming 0.7 release to specialized hardware acceleration on TPUs and NVIDIA GPUs.

What to Expect

Deep technical sessions from llm-d maintainers, committers, and teams using AI at scale
Live demos focused on real distributed workflows
Great networking with food and drinks

Who Should Attend

ML and Infrastructure Engineers focused on high-throughput serving.
Platform Architects building GenAI stacks on Kubernetes or Cloud.
Open-source contributors interested in the future of distributed LLM orchestration.

Meetup Agenda

Agenda is preliminary and subject to speaker confirmation

5:00pm — Doors Open & Check-In

Security check-in, networking, and light refreshments.

5:30pm — Intro to llm-d & The 0.6 Roadmap

Speaker: Tyler Michael Smith - llm-d Core Maintainer, Red Hat
Topic: An overview of the llm-d 0.7 release, the future of distributed inference, and how the community is evolving to meet the demands of next-gen model architectures.

6:00pm — Achieve state-of-the-art inference: High performance on TPUs and GPUs with llm-d

Speaker: Sean Horgan, Google Cloud Engineering
Topic: This session dives deep into how to architect disaggregated serving and automatic key-value cache storage tiering on Ironwood (TPU7x). Learn to implement routing optimized for service-level objectives and build a portable, high-performance inference fleet that scales automatically based on real-time server conditions.

6:30pm — Using llm-d for Efficient Inference at Scale

Speaker: Peter Tanski, Capital One
Topic: Peter will talk about their learnings in implementing llm-d as a central component to solve the challenges of serving open source LLMs at scale: GPU utilization, mixed workloads and efficient inference.

7:00pm — Additional Speakers TBD

7:30pm — Networking, Food, and Drinks 🍕🤝

Deep-dive conversations with the maintainers and local Boston/Cambridge AI community.

8:30pm — Event Ends

Location

Please register to see the exact location of this event.

Cambridge, MA

Presented by

vLLM Meetups and Events

Join the vLLM community to discuss optimizing LLM inference!

Hosted By

42 Going

AI

Open Source Distributed AI Inference (llm-d/vLLM) Meetup

​Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

​Event Overview

​What to Expect

​Who Should Attend

​Meetup Agenda

Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

Event Overview

What to Expect

Who Should Attend

Meetup Agenda