

Three Frontiers of Scalable RL for LLMs
NICE Talk 172 invites Bingxiang He @HBX_hbx, PhD student at Tsinghua University @TsinghuaNLP, to share Three Frontiers of Scalable RL for LLMs.
Talk Time ⏰ EST: 5.15 22:00~23:00
🤠 Can RL advance model capabilities without any supervised signals?
🧐 Three Frontiers, One Map: Charting the Feasible Region of Scalable RL Matters More Than Inventing Another Trick
😈 Explicit length penalties and more lenient verifiers both led to significant performance degradation.
😈 Switching to a higher-scoring teacher can paradoxically shrink—or even reverse—student gains.
Work will related in the talk:
🌟JustRL: https://github.com/thunlp/JustRL
🌟Unsupervised RLVR: https://github.com/PRIME-RL/TTRL/tree/urlvr-dev
🌟Rethinking On-Policy Distillation: https://github.com/thunlp/OPD
Host: Cheng Qian, PhD at UIUC
#AI #LLM #scalinglaw #model