Cover Image for RL Verifiers by Yash More (Cerebras)
Cover Image for RL Verifiers by Yash More (Cerebras)
Avatar for Studio 535
Presented by
Studio 535
Subscribe to event notifications at 535toronto.substack.com
Get Tickets
Approval Required
Your registration is subject to approval by the host.
Ticket Price
CA$5.35
Welcome! To join the event, please get your ticket below.
About Event

Yash will give an overview of RLVR and verifiers.

Reinforcement Learning with Verifiable Rewards (RLVR) replaces noisy human feedback with deterministic signals that make verification robust. In this talk, we will examine what makes a reward verifiable, how credit can be effectively assigned through process and outcome supervision, and the algorithms that enable RLVR to scale and shape reasoning capabilities of LLMs.

Some materials (for better context):
- ⁠DeepSeekMath (https://arxiv.org/abs/2402.03300)
- Tulu3 (https://arxiv.org/pdf/2411.15124)
- https://arxiv.org/pdf/2408.03314

There will be snacks and drinks.

Location
Studio 535
535 Queen St E, Toronto, ON M5A 1V1, Canada
Back entrance, where the parking lot is, our door has a 535 sticker on it
Avatar for Studio 535
Presented by
Studio 535
Subscribe to event notifications at 535toronto.substack.com