

London vLLM & llm-d Inference Meetup
United Kingdom vLLM & llm-d Inference Meetup
Hosted by Red Hat AI, NVIDIA, and Stelia AI, this event takes place on 10 June 2026 in London, UK.
Join us for a deep dive into the engine room of vLLM and llm-d AI inferencing, where we will focus on the architecture, optimizations, and raw engineering required to run inference at scale.
Whether you’re looking to squeeze every last token out of your GPU cluster or you're curious about the latest commits to the vLLM and llm-d ecosystems, this is the room you want to be in.
What to Expect
Deep Technical Sessions: Hear directly from the maintainers and core committers of vLLM and llm-d
Scale in Production: Learn from industry leaders about deploying LLMs in production
Live Demos: See live demos focused on real-world workflows
Networking: Stick around for food and drinks. It’s a great chance to chat with the speakers and exchange ideas with fellow developers and engineers.
Who Should Attend
vLLM and llm-d users and contributors
ML and infra engineers working on inference and serving
Platform teams running GenAI in production
Anyone curious about efficient inference across local, cloud, and Kubernetes
Agenda (Subject to More Awesomeness)
17:00 – 17:30 — Doors Open, Check-In
17:30 – 17:40 — Welcome and Opening Remarks
Sasa Zelenovic, Sr. Technical Marketing Manager, Red Hat AI
17:40 – 18:10 — Intro to vLLM and Project Update
Michael Goin, vLLM Core Committer and Principal Engineer, Red Hat AI
18:10 – 18:30 — Accelerating AI Inference with Speculative Decoding
Eldar Kurtić, Principal Research Scientist, Red Hat AI & ISTA
18:30 – 18:50 — From Eval to Production: Building a Managed vLLM Service with llm-d
David Hughes, Chief Technology Officer, Stelia AI
18:50 – 19:10 — Model Express: From Cold Start to Hot Tokens
Ganesh Kudleppanavar, System Software Manager, NVIDIA
19:10 – 19:20 — Test, Defend, Prove: Data-Driven AI Safety
Stuart Battersby, AI Safety & Model Evaluation Architect, Red Hat AI
19:20 – 19:40 — Discussion and Q&A
19:40 – 21:00 — Networking, Food and Drinks
Important information
Registration closes 24 hours before the event. We cannot admit unregistered attendees.
Please bring a photo ID to verify your registration on arrival.
See you in London
If you are building, deploying, or scaling inference, this is the room to be in.
See you soon!