Cover Image for vLLM Inference Meetup: Bangalore, India
Cover Image for vLLM Inference Meetup: Bangalore, India
Avatar for vLLM Meetups and Events
Join the vLLM community to discuss optimizing LLM inference!
178 Went

vLLM Inference Meetup: Bangalore, India

Register to See Address
Bengaluru, Karnataka
Registration
Registration Closed
This event is not currently taking registrations. You may contact the host or subscribe to receive updates.
About Event

Want to cut your LLM inference costs by 50-70% while handling 2-3x more concurrent users?

Join experts from Red Hat, NxtGen, and Harness.io, your hosts for Bangalore's premier vLLM meetup, where you'll learn production-ready techniques to optimize LLM inference. This is your chance to learn directly from engineers running vLLM at scale, get hands-on with NVIDIA GPUs (provided free!), and walk away with strategies that cut inference costs by 50-70% while handling 2-3x more concurrent users.

What makes this unmissable:

  • Free NVIDIA GPU instances for every attendee: deploy and benchmark real workloads​

  • Production insights from experts

  • Hands-on lab session where you'll optimize actual vLLM deployments​

  • Network with ML engineers, DevOps practitioners, and AI infrastructure experts solving similar challenges​

  • Learn PagedAttention, KV cache optimization, and intelligent model routing techniques you can implement immediately​

Come Prepared With

  • Laptop with SSH client (we provide the GPU instances!)​

  • Government-issued photo ID for venue security​

  • Your toughest LLM deployment challenges and questions​

Session Lineup

02:00 PM to 2:30 PM - Welcome and Opening Remarks (Red Hat & NxtGen)

02:30PM - 03:00PM: Keynote: Turning GenAI Investments into Results - Why Inference Matters - Steve Shirkey

Understand the economics and technical requirements of production inference

03:00PM - 03:30PM: High-Performance LLM Inference with vLLM - Prasad Mukhedkar, Red Hat

Deep dive into PagedAttention and KV cache optimizations that unlock 2-3x GPU utilisation

03:30PM : 04:00 PM - vLLM Semantic Routing - Ompragash, Harness

Smart request routing across models for optimal cost-performance trade-offs

10 Minutes Break

04:10 PM- : 04:40 PM M: NxtGen's Agentic Platform - Abhishek Kumar, NxtGen

Real-world agent architectures and deployment patterns

04:40 PM : 05:10 PM How to size Nvidia GPU for Inference workloads? - Akash Paul, Nvidia

05:10 PM: 05:30 PM - ☕ Networking Break with Snacks

05:30PM: 06:30PM - 🔥 Hands-on Lab: vLLM Inference with NVIDIA GPUs Deploy, benchmark, and optimize real inference workloads with expert guidance

Deploy, benchmark, and optimize real inference workloads with expert guidance

Seats are limited! secure your spot now!

Location
Please register to see the exact location of this event.
Bengaluru, Karnataka
Avatar for vLLM Meetups and Events
Join the vLLM community to discuss optimizing LLM inference!
178 Went