

vLLM Inference Meetup: Bangalore, India
Want to cut your LLM inference costs by 50-70% while handling 2-3x more concurrent users?
Join experts from Red Hat, NxtGen, and Harness.io, your hosts for Bangalore's premier vLLM meetup, where you'll learn production-ready techniques to optimize LLM inference. This is your chance to learn directly from engineers running vLLM at scale, get hands-on with NVIDIA GPUs (provided free!), and walk away with strategies that cut inference costs by 50-70% while handling 2-3x more concurrent users.
What makes this unmissable:
Free NVIDIA GPU instances for every attendee: deploy and benchmark real workloads
Production insights from experts
Hands-on lab session where you'll optimize actual vLLM deployments
Network with ML engineers, DevOps practitioners, and AI infrastructure experts solving similar challenges
Learn PagedAttention, KV cache optimization, and intelligent model routing techniques you can implement immediately
Come Prepared With
Laptop with SSH client (we provide the GPU instances!)
Government-issued photo ID for venue security
Your toughest LLM deployment challenges and questions
Session Lineup
14:30 - Registration & Opening Remarks (Red Hat & NxtGen)
15:00 - Keynote: Turning GenAI Investments into Results - Why Inference Matters
Understand the economics and technical requirements of production inference
15:30 - High-Performance LLM Inference with vLLM - Prasad Mukhedkar
Deep dive into PagedAttention and KV cache optimizations that unlock 2-3x GPU utilization
16:00 - Scaling Distributed LLM Inference with llm-d - TBA
Intelligent routing strategies that reduce costs 30-50% while maintaining SLAs
16:30 - M: NxtGen's Agentic Platform - Abhishek Kumar - NxtGen
Real-world agent architectures and deployment patterns
17:00 - vLLM Semantic Routing - Ompragash (Harness)
Smart request routing across models for optimal cost-performance trade-offs
17:30 - ☕ Networking Break with Snacks
18:00-19:00 - 🔥 Hands-on Lab: vLLM Inference with NVIDIA GPUs
Deploy, benchmark, and optimize real inference workloads with expert guidance
Seats are limited! secure your spot now!