Cover Image for Open Source Distributed AI Inference (llm-d/vLLM) Meetup
Cover Image for Open Source Distributed AI Inference (llm-d/vLLM) Meetup
Avatar for vLLM Meetups and Events
Join the vLLM community to discuss optimizing LLM inference!
153 Went

Open Source Distributed AI Inference (llm-d/vLLM) Meetup

Register to See Address
Cambridge, MA
Registration
Registration Closed
This event is not currently taking registrations. You may contact the host or subscribe to receive updates.
About Event

Open Source Distributed AI Inference (llm-d/vLLM) Meetup Boston/Cambridge

Hosted by Google Cloud, Red Hat AI, and the llm-d Community

Date: Thursday, May 28th 2026

Event Overview

Join us at Google’s Cambridge office for a deep dive into the latest advancements in open-source distributed inference. This session will focus on the evolution of llm-d, ranging from the recent 0.7 release to specialized hardware acceleration on Google TPUs.

What to Expect

  • Deep technical sessions from llm-d maintainers, committers, and teams using AI at scale

  • Live demos focused on real distributed workflows

  • Great networking with food and drinks

Who Should Attend

  • ML and Infrastructure Engineers focused on high-throughput serving.

  • Platform Architects building GenAI stacks on Kubernetes or Cloud.

  • Open-source contributors interested in the future of distributed LLM orchestration.

Meetup Agenda

Agenda is preliminary and subject to speaker confirmation

5:00pm — Doors Open & Check-In

Security check-in, networking, and light refreshments.

5:30pm — Intro to llm-d & The 0.8 Roadmap

  • Speaker: Tyler Michael Smith - llm-d Core Maintainer, Red Hat

  • Topic: An overview of the recent llm-d 0.7 release, the future of distributed inference, and how the community is evolving to meet the demands of next-gen model architectures.

6:00pm — Achieve state-of-the-art inference: High performance on TPUs and GPUs with llm-d

  • Speaker: Kaushik Mitra, Google Cloud Engineering

  • Topic: This session dives deep into how to architect disaggregated serving and automatic key-value cache storage tiering on Ironwood (TPU7x). Learn to implement routing optimized for service-level objectives and build a portable, high-performance inference fleet that scales automatically based on real-time server conditions.

6:30pm — Using llm-d for Efficient Inference at Scale

  • Speaker: Peter Tanski, Capital One

  • Topic: Peter will talk about their learnings in implementing llm-d as a central component to solve the challenges of serving open source LLMs at scale: GPU utilization, mixed workloads and efficient inference.

7:00pm — Additional Topics (still TBD)

We are drafting a list of brief updates to cover live:

  • Inference performance analysis with Prism: https://prism.llm-d.ai, Sean Horgan

  • KV Cache offloading

  • TPU 7x overview, Liat Berry

7:30pm — Networking, Food, and Drinks 🍕🤝

  • Deep-dive conversations with the maintainers and local Boston/Cambridge AI community.

8:30pm — Event Ends

Location
Please register to see the exact location of this event.
Cambridge, MA
Avatar for vLLM Meetups and Events
Join the vLLM community to discuss optimizing LLM inference!
153 Went