Cover Image for vLLM Inference Meetup: Bangalore, India
Cover Image for vLLM Inference Meetup: Bangalore, India
Avatar for vLLM Meetups and Events
Join the vLLM community to discuss optimizing LLM inference!
74 Going

vLLM Inference Meetup: Bangalore, India

Registration
Approval Required
Your registration is subject to approval by the host.
Welcome! To join the event, please register below.
About Event

Want to cut your LLM inference costs by 50-70% while handling 2-3x more concurrent users?

Join experts from Red Hat, NxtGen, and Harness.io, your hosts for Bangalore's premier vLLM meetup, where you'll learn production-ready techniques to optimize LLM inference. This is your chance to learn directly from engineers running vLLM at scale, get hands-on with NVIDIA GPUs (provided free!), and walk away with strategies that cut inference costs by 50-70% while handling 2-3x more concurrent users.

What makes this unmissable:

  • Free NVIDIA GPU instances for every attendee: deploy and benchmark real workloads​

  • Production insights from experts

  • Hands-on lab session where you'll optimize actual vLLM deployments​

  • Network with ML engineers, DevOps practitioners, and AI infrastructure experts solving similar challenges​

  • Learn PagedAttention, KV cache optimization, and intelligent model routing techniques you can implement immediately​

Come Prepared With

  • Laptop with SSH client (we provide the GPU instances!)​

  • Government-issued photo ID for venue security​

  • Your toughest LLM deployment challenges and questions​

Session Lineup

14:30 - Registration & Opening Remarks (Red Hat & NxtGen)

15:00 - Keynote: Turning GenAI Investments into Results - Why Inference Matters
Understand the economics and technical requirements of production inference

15:30 - High-Performance LLM Inference with vLLM - Prasad Mukhedkar
Deep dive into PagedAttention and KV cache optimizations that unlock 2-3x GPU utilization​

16:00 - Scaling Distributed LLM Inference with llm-d - TBA
Intelligent routing strategies that reduce costs 30-50% while maintaining SLAs​

16:30 - M: NxtGen's Agentic Platform - Abhishek Kumar - NxtGen
Real-world agent architectures and deployment patterns​

17:00 - vLLM Semantic Routing - Ompragash (Harness)
Smart request routing across models for optimal cost-performance trade-offs​

17:30 - ☕ Networking Break with Snacks

18:00-19:00 - 🔥 Hands-on Lab: vLLM Inference with NVIDIA GPUs
Deploy, benchmark, and optimize real inference workloads with expert guidance​

Seats are limited! secure your spot now!

Location
TOWER-A
Summit, Brigade Metropolis, Garudachar Palya, Mahadevapura, Bengaluru, Karnataka 560048, India
Avatar for vLLM Meetups and Events
Join the vLLM community to discuss optimizing LLM inference!
74 Going