Cover Image for AI Safety Poland Talks #4

Presented by

AI Safety Poland

AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.

Hosted By

63 Went

AI

Michał Podlewski

invites you to join

AI Safety Poland Talks #4

AI Safety Poland

Google Meet

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Welcome to AI Safety Poland Talks!

A biweekly series where researchers, professionals, and enthusiasts from Poland or connected to the Polish AI community share their work on AI Safety.

💁 Topic: Chain of thought monitorability: A new and fragile opportunity for AI safety
📣 Speaker: Tomek Korbak
🇬🇧 Language: English
🗓️ Date: 18.12.2025, 18:00
📍 Location: Online

Speaker Bio
Tomek Korbak is a member of technical staff on OpenAI's safety oversight team, working on safety and security measures for LLM agents. Before that, he worked on safety cases and AI control at the UK AI Security Institute and on honesty post-training at Anthropic. He did a PhD at the University of Sussex and studied cognitive science, philosophy and physics at the University of Warsaw.

Abstract
AI systems that "think" in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods. Because CoT monitorability may be fragile, we recommend that frontier model developers consider the impact of development decisions on CoT monitorability.

Presented by

AI Safety Poland

AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.

Hosted By

63 Went

AI