Cover Image for Seminario AISAR - Joar Skalse - The Theoretical Foundations of Reward Learning

Presented by

The AISAR Scholarships aim to promote research in AI Safety in Argentina, connecting talented students with top-level researchers.

This online seminar series features international AI safety experts.

Hosted By

12 Went

Seminario AISAR - Joar Skalse - The Theoretical Foundations of Reward Learning

AISAR Seminars

Zoom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

🎤 Orador: Joar Skalse – PhD @ University of Oxford | Director @ DEDUCTO

📖 Título: The Theoretical Foundations of Reward Learning

Abstract: In this talk, I will provide an overview of my research on how to build a theoretical foundation for the field of reward learning, including my main motivations for pursuing this research, and some of my core results.

This research agenda involves answering questions such as: What is the right method for expressing goals and instructions to AI systems? How similar must two different goal specifications be in order to not be hackable? What is the right way to quantify the differences and similarities between different goal specifications in a given specification language? What happens if you execute a task specification that is not close to the “ideal” specification? Which specification learning algorithms are guaranteed to converge to a good specification? How sensitive are these specification learning algorithms to misspecification? If we have a bound on the error in a specification (under some metric), can we devise safe methods for optimising it?

Encontrá más detalles en: https://www.lesswrong.com/s/TEybbkyHpMEB2HTv3