Cover Image for Open Source Distributed AI Inference (llm-d/vLLM) Meetup

Presented by

vLLM Meetups and Events

Join the vLLM community to discuss optimizing LLM inference!

Hosted By

153 Went

AI

Featured in

Boston Tech Week

Open Source Distributed AI Inference (llm-d/vLLM) Meetup

Name: Open Source Distributed AI Inference (llm-d/vLLM) Meetup
Start: 2026-05-28T17:00:00.000-04:00
End: 2026-05-28T20:30:00.000-04:00
Location: Cambridge, MA

vLLM Meetups and Events

Register to See Address

Cambridge, MA

Registration Closed

This event is not currently taking registrations. You may contact the host or subscribe to receive updates.

About Event

Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

Hosted by Google Cloud, Red Hat AI, and the llm-d Community

Date: Thursday, May 28th 2026

Event Overview

Join us at Google’s Cambridge office for a deep dive into the latest advancements in open-source distributed inference. This session will focus on the evolution of llm-d, ranging from the recent 0.7 release to specialized hardware acceleration on Google TPUs.

What to Expect

Deep technical sessions from llm-d maintainers, committers, and teams using AI at scale
Live demos focused on real distributed workflows
Great networking with food and drinks

Who Should Attend

ML and Infrastructure Engineers focused on high-throughput serving.
Platform Architects building GenAI stacks on Kubernetes or Cloud.
Open-source contributors interested in the future of distributed LLM orchestration.

Meetup Agenda

Agenda is preliminary and subject to speaker confirmation

5:00pm — Doors Open & Check-In

Security check-in, networking, and light refreshments.

5:30pm — Intro to llm-d & The 0.8 Roadmap

Speaker: Tyler Michael Smith - llm-d Core Maintainer, Red Hat
Topic: An overview of the recent llm-d 0.7 release, the future of distributed inference, and how the community is evolving to meet the demands of next-gen model architectures.

6:00pm — Achieve state-of-the-art inference: High performance on TPUs and GPUs with llm-d

Speaker: Kaushik Mitra, Google Cloud Engineering
Topic: This session dives deep into how to architect disaggregated serving and automatic key-value cache storage tiering on Ironwood (TPU7x). Learn to implement routing optimized for service-level objectives and build a portable, high-performance inference fleet that scales automatically based on real-time server conditions.

6:30pm — Using llm-d for Efficient Inference at Scale

Speaker: Peter Tanski, Capital One
Topic: Peter will talk about their learnings in implementing llm-d as a central component to solve the challenges of serving open source LLMs at scale: GPU utilization, mixed workloads and efficient inference.

7:00pm — Additional Topics (still TBD)

We are drafting a list of brief updates to cover live:

Inference performance analysis with Prism: https://prism.llm-d.ai, Sean Horgan
KV Cache offloading
TPU 7x overview, Liat Berry

7:30pm — Networking, Food, and Drinks 🍕🤝

Deep-dive conversations with the maintainers and local Boston/Cambridge AI community.

8:30pm — Event Ends

Location

Please register to see the exact location of this event.

Cambridge, MA

Presented by

vLLM Meetups and Events

Join the vLLM community to discuss optimizing LLM inference!

Hosted By

153 Went

AI

Open Source Distributed AI Inference (llm-d/vLLM) Meetup

​Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

​Event Overview

​What to Expect

​Who Should Attend

​Meetup Agenda

Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

Event Overview

What to Expect

Who Should Attend

Meetup Agenda