Scalable Inference Algorithms for Large Language Models

Name: Scalable Inference Algorithms for Large Language Models
Start: 2025-12-20T13:00:00.000+08:00
End: 2025-12-20T14:00:00.000+08:00
Location: NS Library

Hosted by AER LABS

NS Library

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Abstract
Inference efficiency is a key bottleneck in deploying large language models (LLMs) at scale, especially for applications that require long-context understanding or test-time scaling for improved reasoning.

In this seminar, Woomin Song (KAIST) will present two training-free inference frameworks that significantly reduce latency and memory costs while remaining fully compatible with existing models:

REFORM (NeurIPS 2025): Enables efficient long-context inference by extending usable context length far beyond pretraining limits. It achieves high accuracy with reduced compute and memory overhead.
STAND (EMNLP 2025): Accelerates test-time scaling methods (e.g., best-of-N sampling, tree search) through model-free speculative decoding, delivering substantial speedups without sacrificing accuracy.

Together, these approaches demonstrate how rethinking inference—rather than retraining or scaling models—can deliver practical gains in performance, cost, and deployability for real-world LLM systems.

Speaker Bio
Woomin Song is a Ph.D. student at KAIST AI, advised by Prof. Jinwoo Shin. His research focuses on building efficient machine learning systems, specifically targeting the reduction of inference costs for Large Language Models (LLMs).

He previously worked as an Applied Scientist Intern at Amazon AGI. He holds a B.S. in Electrical Engineering and Computer Science (double major) with a minor in Mathematics from KAIST (2022). His recent work on architectural modifications for computational efficiency has been accepted at top-tier conferences including NeurIPS and EMNLP.

🔗 Connect

Email: [email protected]
Website: woominsong.github.io

Location

NS Library

Hosted By

4 Went

IA