vLLM Inference Meetup: Bangalore, India

Name: vLLM Inference Meetup: Bangalore, India
Start: 2025-12-13T13:30:00.000+05:30
End: 2025-12-13T18:30:00.000+05:30
Location: Bengaluru, Karnataka

vLLM Meetups and Events

Register to See Address

Bengaluru, Karnataka

Registration Closed

This event is not currently taking registrations. You may contact the host or subscribe to receive updates.

About Event

Want to cut your LLM inference costs by 50-70% while handling 2-3x more concurrent users?

Join experts from Red Hat, NxtGen, and Harness.io, your hosts for Bangalore's premier vLLM meetup, where you'll learn production-ready techniques to optimize LLM inference. This is your chance to learn directly from engineers running vLLM at scale, get hands-on with NVIDIA GPUs (provided free!), and walk away with strategies that cut inference costs by 50-70% while handling 2-3x more concurrent users.

What makes this unmissable:

Free NVIDIA GPU instances for every attendee: deploy and benchmark real workloads
Production insights from experts
Hands-on lab session where you'll optimize actual vLLM deployments
Network with ML engineers, DevOps practitioners, and AI infrastructure experts solving similar challenges
Learn PagedAttention, KV cache optimization, and intelligent model routing techniques you can implement immediately

Come Prepared With

Laptop with SSH client (we provide the GPU instances!)
Government-issued photo ID for venue security
Your toughest LLM deployment challenges and questions

Session Lineup

02:00 PM to 2:30 PM - Welcome and Opening Remarks (Red Hat & NxtGen)

02:30PM - 03:00PM: Keynote: Turning GenAI Investments into Results - Why Inference Matters - Steve Shirkey

Understand the economics and technical requirements of production inference

03:00PM - 03:30PM: High-Performance LLM Inference with vLLM - Prasad Mukhedkar, Red Hat

Deep dive into PagedAttention and KV cache optimizations that unlock 2-3x GPU utilisation

03:30PM : 04:00 PM - vLLM Semantic Routing - Ompragash, Harness

Smart request routing across models for optimal cost-performance trade-offs

10 Minutes Break

04:10 PM- : 04:40 PM M: NxtGen's Agentic Platform - Abhishek Kumar, NxtGen

Real-world agent architectures and deployment patterns

04:40 PM : 05:10 PM How to size Nvidia GPU for Inference workloads? - Akash Paul, Nvidia

05:10 PM: 05:30 PM - ☕ Networking Break with Snacks

05:30PM: 06:30PM - 🔥 Hands-on Lab: vLLM Inference with NVIDIA GPUs Deploy, benchmark, and optimize real inference workloads with expert guidance

Deploy, benchmark, and optimize real inference workloads with expert guidance

Seats are limited! secure your spot now!

Location

Please register to see the exact location of this event.

Bengaluru, Karnataka

Presented by

vLLM Meetups and Events

Join the vLLM community to discuss optimizing LLM inference!

Hosted By

178 Went

AI

vLLM Inference Meetup: Bangalore, India

​Want to cut your LLM inference costs by 50-70% while handling 2-3x more concurrent users?

​Come Prepared With

​Session Lineup

Want to cut your LLM inference costs by 50-70% while handling 2-3x more concurrent users?

Come Prepared With

Session Lineup