

vLLM Inference Meetup: Bangalore, India
Want to cut your LLM inference costs by 50-70% while handling 2-3x more concurrent users?
Join experts from Red Hat, NxtGen, and Harness.io, your hosts for Bangalore's premier vLLM meetup, where you'll learn production-ready techniques to optimize LLM inference. This is your chance to learn directly from engineers running vLLM at scale, get hands-on with NVIDIA GPUs (provided free!), and walk away with strategies that cut inference costs by 50-70% while handling 2-3x more concurrent users.
What makes this unmissable:
Free NVIDIA GPU instances for every attendee: deploy and benchmark real workloads
Production insights from experts
Hands-on lab session where you'll optimize actual vLLM deployments
Network with ML engineers, DevOps practitioners, and AI infrastructure experts solving similar challenges
Learn PagedAttention, KV cache optimization, and intelligent model routing techniques you can implement immediately
Come Prepared With
Laptop with SSH client (we provide the GPU instances!)
Government-issued photo ID for venue security
Your toughest LLM deployment challenges and questions
Session Lineup
02:00 PM to 2:30 PM - Welcome and Opening Remarks (Red Hat & NxtGen)
02:30PM - 03:00PM: Keynote: Turning GenAI Investments into Results - Why Inference Matters - Steve Shirkey
Understand the economics and technical requirements of production inference
03:00PM - 03:30PM: High-Performance LLM Inference with vLLM - Prasad Mukhedkar, Red Hat
Deep dive into PagedAttention and KV cache optimizations that unlock 2-3x GPU utilisation
03:30PM : 04:00 PM - vLLM Semantic Routing - Ompragash, Harness
Smart request routing across models for optimal cost-performance trade-offs
10 Minutes Break
04:10 PM- : 04:40 PM M: NxtGen's Agentic Platform - Abhishek Kumar, NxtGen
Real-world agent architectures and deployment patterns
04:40 PM : 05:10 PM How to size Nvidia GPU for Inference workloads? - Akash Paul, Nvidia
05:10 PM: 05:30 PM - ☕ Networking Break with Snacks
05:30PM: 06:30PM - 🔥 Hands-on Lab: vLLM Inference with NVIDIA GPUs Deploy, benchmark, and optimize real inference workloads with expert guidance
Deploy, benchmark, and optimize real inference workloads with expert guidance
Seats are limited! secure your spot now!