NICE Academy Talk: Towards Honest Language Models for Dedutive Reasoning

Hosted by NICE AI Talk

YouTube

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Welcome to join the talk at NICE next Monday evening!

This talk is about a very popular topic on deductive reasoning of large language models.

YouTube Livestream link: https://youtube.com/live/U3mCNd3GlK0

Speaker: Jiarui Liu
Jiarui Liu is a third-year PhD student in Computer Science at the Carnegie Mellon University, advised by Prof. Mona Diab. His current research interests focus on alignment for reasoning models and NLP for social good.

Personal website: https://jiarui-liu.github.io/

Abstract:
Deductive reasoning is the process of deriving conclusions strictly from the given premises, without relying on external knowledge. In the work to be presented in this talk, we define honesty in this setting as a model's ability to respond only when the conclusion is logically entailed by the premises, and to abstain otherwise.

However, current language models often fail to reason honestly, producing unwarranted answers when the input is insufficient. To study this challenge, we formulate honest deductive reasoning as multi-step tasks where models must either derive the correct conclusion or abstain. We curate two datasets from graph structures, one for linear algebra and one for logical inference, and introduce unanswerable cases by randomly perturbing an edge in half of the instances. We find that prompting and existing training methods, including GRPO with or without supervised fine-tuning initialization, struggle on these tasks. In particular, GRPO optimize only for final task outcomes, leaving models vulnerable to collapse when negative rewards dominate early training.

To address this, we propose ACNCHOR, a reinforcement learning method that injects ground truth trajectories into rollouts, preventing early training collapse. Our results demonstrate that this method stabilizes learning and significantly improves the overall reasoning performance, underscoring the importance of training dynamics for enabling honest deductive reasoning in language models.

Our Host: Wenyue Hua

Wenyue Hua is currently a senior researcher at Microsoft Research, AI Frontiers. She was a CS postdoctoral researcher at UCSB working with Prof. William Wang. She received her Ph.D. from Rutgers University-New Brunswick, under the supervision of Professor Yongfeng Zhang. Her research focuses on the safety and efficiency of LLM agents, multi-agent interaction, and LLM reasoning. She was selected as KAUST AI Rising Star in 2025, published over 40 papers at top natural language processing and machine learning conferences such as ACL, EMNLP, ICLR, NeurIPS, TACL.

Personal Website: https://wenyueh.github.io/

Hosted By

5 Went