

Teaching Human Values to AI Systems
Evgenii Opryshko explains how we currently teach AI systems to be helpful, honest, and harmless. The talk covers RLHF, RLAIF, Constitutional AI, deliberate alignment, and other approaches used by frontier AI companies and open-source projects to align language models with human values.
Event Schedule
6:00 to 6:30 - Food and introductions
6:30 to 7:30 - Presentation and Q&A
7:30 to 9:00 - Open Discussions
If you can't attend in person, join our live stream starting at 6:30 pm via this link.
This is part of our weekly AI Safety Thursdays series. Join us in examining questions like:
How do we ensure AI systems are aligned with human interests?
How do we measure and mitigate potential risks from advanced AI systems?
What does safer AI development look like?