Cover Image for Numerical Non-Determinism in LLM Reasoning and a Systematic Solution to RL Training–Inference Mismatch
Cover Image for Numerical Non-Determinism in LLM Reasoning and a Systematic Solution to RL Training–Inference Mismatch
Hosted By
11 Went

Numerical Non-Determinism in LLM Reasoning and a Systematic Solution to RL Training–Inference Mismatch

Hosted by NICE AI Talk
YouTube
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Youtube livestream link: https://youtube.com/live/4ay81dNuSR4

Talk Title:
Numerical Non-Determinism in LLM Reasoning and a Systematic Solution to RL Training–Inference Mismatch


Abstract:
LLM generation is not deterministic even when the temperature is set to zero. System-level configuration changes—such as variations in batch size and parallelization strategy—commonly occur in real-world serving due to continuous batching, and can introduce numerical non-determinism. This issue is even more pronounced in reinforcement learning (RL), where the training and rollout engines naturally operate with different batch sizes, kernel selections, and parallelization strategies.
This training–rollout mismatch can lead to suboptimal performance and even training collapse, especially for Mixture-of-Experts (MoE) models. In this talk, I will analyze why this phenomenon occurs and present a system-level solution: achieving determinism by designing and deploying deterministic GPU kernels.


Invited Speaker: Zirui “Ray” Liu

He is an Assistant Professor in the Department of Computer Science at the University of Minnesota. His research interests focus on large language models and their applications, with particular emphasis on long-context problems and long-term memory. He is also deeply interested in machine learning systems, including deterministic kernels and the design and implementation of low-precision systems.


Host: David Li

David Li is a Ph.D. student at Arizona State University advised by Prof. Huan Liu. He obtained the bachelor’s degree in Computer Science from Beijing Language and Cultural University (BLCU) and master’s degree in Data Science from University of California, San Diego (UCSD). He had also been a research intern at Beijing Advanced Innovation Center for Language Resources and AI Lab of Xiaomi. He is the founder of the opensource research community OracleLLM.

Hosted By
11 Went