

Open Source Distributed AI Inference (llm-d/vLLM) Meetup
Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge
Hosted by Google Cloud, Red Hat AI, and the llm-d Community
Date: Thursday, May 28th 2026
Event Overview
Join us at Google’s Cambridge office for a deep dive into the latest advancements in open-source distributed inference. This session will focus on the evolution of llm-d, ranging from the upcoming 0.7 release to specialized hardware acceleration on TPUs and NVIDIA GPUs.
What to Expect
Deep technical sessions from llm-d maintainers, committers, and teams using AI at scale
Live demos focused on real distributed workflows
Great networking with food and drinks
Who Should Attend
ML and Infrastructure Engineers focused on high-throughput serving.
Platform Architects building GenAI stacks on Kubernetes or Cloud.
Open-source contributors interested in the future of distributed LLM orchestration.
Meetup Agenda
Agenda is preliminary and subject to speaker confirmation
5:00pm — Doors Open & Check-In
Security check-in, networking, and light refreshments.
5:30pm — Intro to llm-d & The 0.6 Roadmap
Speaker: Tyler Michael Smith - llm-d Core Maintainer, Red Hat
Topic: An overview of the llm-d 0.7 release, the future of distributed inference, and how the community is evolving to meet the demands of next-gen model architectures.
6:00pm — Achieve state-of-the-art inference: High performance on TPUs and GPUs with llm-d
Speaker: Sean Horgan, Google Cloud Engineering
Topic: This session dives deep into how to architect disaggregated serving and automatic key-value cache storage tiering on Ironwood (TPU7x). Learn to implement routing optimized for service-level objectives and build a portable, high-performance inference fleet that scales automatically based on real-time server conditions.
6:30pm — Using llm-d for Efficient Inference at Scale
Speaker: Peter Tanski, Capital One
Topic: Peter will talk about their learnings in implementing llm-d as a central component to solve the challenges of serving open source LLMs at scale: GPU utilization, mixed workloads and efficient inference.
7:00pm — Additional Speakers TBD
7:30pm — Networking, Food, and Drinks 🍕🤝
Deep-dive conversations with the maintainers and local Boston/Cambridge AI community.
8:30pm — Event Ends