Cover Image for London vLLM & llm-d Inference Meetup
Cover Image for London vLLM & llm-d Inference Meetup
Avatar for vLLM Meetups and Events
Join the vLLM community to discuss optimizing LLM inference!
35 Going
Registration
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

United Kingdom vLLM & llm-d Inference Meetup

Hosted by Red Hat AI, NVIDIA, and Stelia AI, this event takes place on 10 June 2026 in London, UK.

Join us for a deep dive into the engine room of vLLM and llm-d AI inferencing, where we will focus on the architecture, optimizations, and raw engineering required to run inference at scale.

Whether you’re looking to squeeze every last token out of your GPU cluster or you're curious about the latest commits to the vLLM and llm-d ecosystems, this is the room you want to be in.

What to Expect

  • Deep Technical Sessions: Hear directly from the maintainers and core committers of vLLM and llm-d

  • Scale in Production: Learn from industry leaders about deploying LLMs in production

  • Live Demos: See live demos focused on real-world workflows

  • Networking: Stick around for food and drinks. It’s a great chance to chat with the speakers and exchange ideas with fellow developers and engineers.

Who Should Attend

  • vLLM and llm-d users and contributors

  • ML and infra engineers working on inference and serving

  • Platform teams running GenAI in production

  • Anyone curious about efficient inference across local, cloud, and Kubernetes

Agenda (Subject to More Awesomeness)

17:00 – 17:30 — Doors Open, Check-In

17:30 – 17:40 — Welcome and Opening Remarks

Sasa Zelenovic, Sr. Technical Marketing Manager, Red Hat AI

17:40 – 18:10 — Intro to vLLM and Project Update

Michael Goin, vLLM Core Committer and Principal Engineer, Red Hat AI

18:10 – 18:30 — Accelerating AI Inference with Speculative Decoding

Eldar Kurtić, Principal Research Scientist, Red Hat AI & ISTA

18:30 – 18:50 — From Eval to Production: Building a Managed vLLM Service with llm-d

David Hughes, Chief Technology Officer, Stelia AI

18:50 – 19:10 — Model Express: From Cold Start to Hot Tokens

Ganesh Kudleppanavar, System Software Manager, NVIDIA

19:10 – 19:20 — Test, Defend, Prove: Data-Driven AI Safety

Stuart Battersby, AI Safety & Model Evaluation Architect, Red Hat AI

19:20 – 19:40 — Discussion and Q&A

19:40 – 21:00 — Networking, Food and Drinks

Important information

Registration closes 24 hours before the event. We cannot admit unregistered attendees.

Please bring a photo ID to verify your registration on arrival.

See you in London

If you are building, deploying, or scaling inference, this is the room to be in.

See you soon!

Location
Sustainable Ventures
County Hall, Belvedere Rd, London SE1 7PB, UK
County Hall, 5th Floor
Avatar for vLLM Meetups and Events
Join the vLLM community to discuss optimizing LLM inference!
35 Going