Cover Image for AI Safety Poland Talks #9

Presented by

AI Safety Poland

AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.

Hosted By

44 Went

人工智能

AI Safety Poland Talks #9

AI Safety Poland

Google Meet

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Welcome to AI Safety Poland Talks!

A biweekly series where researchers, professionals, and enthusiasts from Poland or connected to the Polish AI community share their work on AI Safety.

💁 Topic: Understanding and controlling behavioral self-awareness in LLMs
📣 Speaker: Taras Kutsyk
🇬🇧 Language: English
🗓️ Date: 05.03.2026, 18:00
📍 Location: Online

Speaker Bio
Taras is a PhD student working on mechanistic interpretability at GMUM, Jagiellonian University. He was previously a MATS 7 scholar in Neel Nanda’s cohort and completed the AI Safety Camp and AI Safety Fundamentals programs. His current research focuses on applying interpretability techniques to AI safety problems, such as studying persona generalization in large language models.

Abstract
We have early yet convincing signs that LLMs possess something very similar to human 'behavioral self-awareness': when they act in a certain way, they are generally aware of it. In this talk, we will explore this phenomenon and its implications, particularly regarding AI Safety. The presentation will cover both existing literature and early findings from my ongoing collaboration with Jan Betley & Bartosz Zieliński.

Presented by

AI Safety Poland

AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.

Hosted By

44 Went

人工智能