Personal

Ayush Satyam

Join us for a focused deep dive into compiling vLLM models with torch.compile. We'll walk through why torch.compile matters for large-model performance and how vLLM leverages it, demonstrate concrete compilation strategies and flags, and profile real-world gains on inference workloads. 

This session is best for people comfortable with vLLM or PyTorch and basic LLM concepts who want actionable techniques to speed up inference. This is mostly a theoretical session and we are open for discussion afterwards; questions and specific pain points are welcome. Meet at NS library on March 9.

vLLM compile deep dive (it's all torch.compile)

Kev🪽

Jackson McDonough

Jake Simonds

Juan Palomino

Hrishikesh Omprakash Yadav

Adam Momen

Jarrett Vickers

Raunaq

Kawin Rungsimuntakul

roc-grotesk

AER LABS

Network School

Namgyu Youn

Standard