Cover Image for Accelerate Your Agent with 2 Lines of Code!
Cover Image for Accelerate Your Agent with 2 Lines of Code!
Hosted By
6 Went

Accelerate Your Agent with 2 Lines of Code!

Hosted by NICE AI Talk
YouTube
Registration
Past Event
Welcome! To join the event, please register below.
About Event

ThunderAgent is the first system to use Program Abstraction to unify GPU, CPU, and remote tool scheduling for distributed LLM agent inference and reinforcement learning rollouts. With just two lines of code, it boosts KV cache efficiency, balances memory across nodes, and prevents resource leaks—achieving 1.5–3.9× higher inference and rollout throughput while saving 4.2× disk space.

Unlike traditional request-level engines, ThunderAgent treats multi-step agent workflows as programs, using a principled STP (Space-Time-Product) cost model to optimize scheduling and ensure robustness under high concurrency. Fully compatible with existing OpenAI API calls, it seamlessly upgrades vLLM/SGLang engines to a fast, simple, and robust agent inference system.

Speaker:

Hao Kang, PhD Candidate (3rd year) at Georgia Tech, advised by Tushar Krishna. Currently a visiting researcher at MIT, collaborating with Prof. Song Han. His research focuses on LLM Agent training and inference systems, agentic financial trading systems, and asynchronous parallel algorithms for agents. His work has been accepted at ICML, NeurIPS, and MLSys, and he has received NeurIPS Spotlight and ENLSP Best Paper Candidate awards. Personal page

Hosted By
6 Went