Cover Image for Three Frontiers of Scalable RL for LLMs
Cover Image for Three Frontiers of Scalable RL for LLMs
Hosted By
1 Went

Three Frontiers of Scalable RL for LLMs

Hosted by NICE AI Talk
YouTube
Registration
Past Event
Welcome! To join the event, please register below.
About Event

NICE Talk 172 invites Bingxiang He @HBX_hbx, PhD student at Tsinghua University @TsinghuaNLP, to share Three Frontiers of Scalable RL for LLMs.

Talk Time ⏰ EST: 5.15 22:00~23:00

🤠 Can RL advance model capabilities without any supervised signals?

🧐 Three Frontiers, One Map: Charting the Feasible Region of Scalable RL Matters More Than Inventing Another Trick

😈 Explicit length penalties and more lenient verifiers both led to significant performance degradation.

😈 Switching to a higher-scoring teacher can paradoxically shrink—or even reverse—student gains.

Work will related in the talk:

🌟JustRL: https://github.com/thunlp/JustRL

🌟Unsupervised RLVR: https://github.com/PRIME-RL/TTRL/tree/urlvr-dev

🌟Rethinking On-Policy Distillation: https://github.com/thunlp/OPD

Host: Cheng Qian, PhD at UIUC

#AI #LLM #scalinglaw #model

Hosted By
1 Went