

LLM Reasoning with Confidence
Talk Abstract
As LLMs shift toward reasoning, a critical bottleneck remains: how to scale reasoning without expensive, task-specific external rewards, especially for open-ended tasks. In this talk, I introduce a confidence-centric view of reasoning. Instead of only asking models to "think longer" or "sample more," we ask how to quantify and use confidence as a first-class signal. I present Self-Certainty, an intrinsic metric that quantifies the unambiguity of a model's reasoning. I show how this enables (1) Inference Scaling: scalable Best-of-N selection without external reward models (https://arxiv.org/pdf/2502.18581); and (2) Self-Improvement: reinforcement learning using the model's own confidence as the learning signal, introducing INTUITOR, which improves on open-ended tasks while matching methods that rely on external supervision (https://arxiv.org/pdf/2505.19590). I'll close with when this approach works or fails, and what "reasoning with confidence" suggests for building more capable and reliable frontier agents.
Our Speaker
Xuandong Zhao is a Postdoctoral Researcher at UC Berkeley, affiliated with the Berkeley Artificial Intelligence Research (BAIR) Lab, where he works with Prof. Dawn Song. He earned his Ph.D. in Computer Science from UC Santa Barbara, advised by Prof. Yu-Xiang Wang and Prof. Lei Li. Sitting at the intersection of machine learning, NLP, and AI safety, Xuandong’s research focuses on the capability and reliability of frontier AI models and agents. He aims to build increasingly powerful systems through scalable reinforcement learning and self-improvement, while ensuring they remain safe and aligned with human values. Xuandong has published over 40 papers in top-tier venues and has served as an Area Chair for conferences such as ACL. He is a recipient of the UCSB Chancellor’s Fellowship and has been recognized with Rising Star awards in both adversarial machine learning and AI.
Homepage: https://xuandongzhao.github.io/
Our Host
Xiao Pan is an Applied Scientist on the Amazon Rufus team. Her work focuses on synthetic data curation pipelines, model post-training, and agentic framework design for Rufus. She also leads the intelligent summarization project on the Amazon search results page. She is currently exploring Agentic Reinforcement Learning.
She holds dual master’s degrees from Télécom Paris and Université Paris-Saclay. Her research spans ASR, language models and multilingual machine translation, with multiple publications in EMNLP and ACL.
Homepage: https://panxiao1994.github.io/