Cover Image for Open Source Distributed AI Inference (llm-d/vLLM) Meetup
Cover Image for Open Source Distributed AI Inference (llm-d/vLLM) Meetup
Avatar for vLLM Meetups and Events
Join the vLLM community to discuss optimizing LLM inference!
42 Going

Open Source Distributed AI Inference (llm-d/vLLM) Meetup

Register to See Address
Cambridge, MA
Registration
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

Hosted by Google Cloud, Red Hat AI, and the llm-d Community

Date: Thursday, May 28th 2026

Event Overview

Join us at Google’s Cambridge office for a deep dive into the latest advancements in open-source distributed inference. This session will focus on the evolution of llm-d, ranging from the upcoming 0.7 release to specialized hardware acceleration on TPUs and NVIDIA GPUs.

What to Expect

  • Deep technical sessions from llm-d maintainers, committers, and teams using AI at scale

  • Live demos focused on real distributed workflows

  • Great networking with food and drinks

Who Should Attend

  • ML and Infrastructure Engineers focused on high-throughput serving.

  • Platform Architects building GenAI stacks on Kubernetes or Cloud.

  • Open-source contributors interested in the future of distributed LLM orchestration.

Meetup Agenda

Agenda is preliminary and subject to speaker confirmation

5:00pm — Doors Open & Check-In

Security check-in, networking, and light refreshments.

5:30pm — Intro to llm-d & The 0.6 Roadmap

  • Speaker: Tyler Michael Smith - llm-d Core Maintainer, Red Hat

  • Topic: An overview of the llm-d 0.7 release, the future of distributed inference, and how the community is evolving to meet the demands of next-gen model architectures.

6:00pm — Achieve state-of-the-art inference: High performance on TPUs and GPUs with llm-d

  • Speaker: Sean Horgan, Google Cloud Engineering

  • Topic: This session dives deep into how to architect disaggregated serving and automatic key-value cache storage tiering on Ironwood (TPU7x). Learn to implement routing optimized for service-level objectives and build a portable, high-performance inference fleet that scales automatically based on real-time server conditions.

6:30pm — Using llm-d for Efficient Inference at Scale

  • Speaker: Peter Tanski, Capital One

  • Topic: Peter will talk about their learnings in implementing llm-d as a central component to solve the challenges of serving open source LLMs at scale: GPU utilization, mixed workloads and efficient inference.

7:00pm — Additional Speakers TBD

7:30pm — Networking, Food, and Drinks 🍕🤝

  • Deep-dive conversations with the maintainers and local Boston/Cambridge AI community.

8:30pm — Event Ends

Location
Please register to see the exact location of this event.
Cambridge, MA
Avatar for vLLM Meetups and Events
Join the vLLM community to discuss optimizing LLM inference!
42 Going