

Seminario AISAR - Joar Skalse - The Theoretical Foundations of Reward Learning
🎤 Orador: Joar Skalse – PhD @ University of Oxford | Director @ DEDUCTO
📖 Título: The Theoretical Foundations of Reward Learning
Abstract: In this talk, I will provide an overview of my research on how to build a theoretical foundation for the field of reward learning, including my main motivations for pursuing this research, and some of my core results.
This research agenda involves answering questions such as: What is the right method for expressing goals and instructions to AI systems? How similar must two different goal specifications be in order to not be hackable? What is the right way to quantify the differences and similarities between different goal specifications in a given specification language? What happens if you execute a task specification that is not close to the “ideal” specification? Which specification learning algorithms are guaranteed to converge to a good specification? How sensitive are these specification learning algorithms to misspecification? If we have a bound on the error in a specification (under some metric), can we devise safe methods for optimising it?
Encontrá más detalles en: https://www.lesswrong.com/s/TEybbkyHpMEB2HTv3