

Maximizing the value of reinforcement learning
Despite decades of research in RL on sample efficiency and generalization, the scaling of RL currently centers around policy gradient methods that often eschew not only these prior innovations, but even value functions. What aspects of the pre-LLM RL field are most promising to revisit? Is there a place for temporal difference based methods in the transformer-based era of RL?
Join Vmax and SPC for a panel discussion featuring Danijar Hafner (creator of Dreamer), Ashvin Nair (ex-OpenAI, now RL foundations at Cursor) and Nate Rahn (research scientist at Anthropic) on which parts of pre-LLM reinforcement learning are likely to be the most fruitful in the LLM era.
Schedule:
5:00pm doors open
5:30pm - 6:30pm panel discussion
6:30pm - 7:30pm drinks
Who should attend: Researchers and engineers interested in RL.