Cover Image for AI Safety Thursday: Chain-of-Thought Monitoring for AI Control

Presented by

Catalyzing Toronto's role in steering AI progress toward a future of human flourishing. Join us for a variety of events on technical AI safety, governance in a world of advanced AI, and more.

Hosted By

5 Going

AI

AI Safety Thursday: Chain-of-Thought Monitoring for AI Control

Name: AI Safety Thursday: Chain-of-Thought Monitoring for AI Control
Start: 2025-10-30T18:00:00.000-04:00
End: 2025-10-30T21:00:00.000-04:00
Location: 30 Adelaide St E 12th floor

Trajectory Labs

30 Adelaide St E 12th floor

Toronto, Ontario

Welcome! Please choose your desired ticket type:

You will be asked to verify token ownership with your wallet.

About Event

Modern reasoning models do a lot of thinking in natural language before producing their outputs. Can we catch misbehaviors by our LLMs and interpret their motivations simply by reading these chains of thought?

In this talk, Rauno Arike and Rohan Subramani will give an overview of research areas in chain-of-thought monitorability and AI control, and discuss their recent research on the usefulness of chain-of-thought monitoring for ensuring that LLM agents only pursue objectives that their developers intended them to follow.

Event Schedule
6:00 to 6:30 - Food & Networking
6:30 to 7:30 - Main Presentation & Questions
7:30 to 9:00 - Breakout Discussions

If you can't make it in person, feel free to join the live stream starting at 6:30 pm, via this link.

Location