Cover Image for AI Safety Poland Talks #11

Presented by

AI Safety Poland

AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.

Hosted By

30 Went

AI

AI Safety Poland Talks #11

AI Safety Poland

Google Meet

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Welcome to AI Safety Poland Talks!

A biweekly series where researchers, professionals, and enthusiasts from Poland or connected to the Polish AI community share their work on AI Safety.

💁 Topic: Jailbreaking Vision-Language Models Through the Visual Modality
📣 Speaker: Jan Dubiński
🇬🇧 Language: English
🗓️ Date: 02.04.2026, 18:00
📍 Location: Online

Speaker Bio
Jan Dubiński is a researcher at the NASK National Research Institute and Warsaw University of Technology. His work focuses on the safety of generative models, including both language and vision systems. He is currently collaborating with Owain Evans’s Truthful AI as part of the Constellation Astra Fellowship. Jan’s past research work includes MARS Fellowship at the Cambridge AI Safety Hub, internships at the CISPA Helmholtz Center for Information Security and Sapienza University of Rome, together with collaboration with CERN.

Abstract
Vision-language models (VLMs) combine image understanding with language reasoning. They process visual and textual inputs together to generate a response. While much progress has been made on aligning language models against text-based misuse, the visual side of VLMs remains less explored as a safety risk. We show that harmful intent can be concealed in visually benign inputs and still be recovered by VLMs through context, structure, and inference. For example, in context-rich images, a harmful object such as a bomb can be replaced with a benign substitute like a banana. The VLM treats references to the benign term as referring to the harmful object, which leads it to comply with dangerous requests such as bomb-making instructions. Our results show that VLM safety cannot be treated as just an extension of text-only alignment. Robust safety measures must also treat the visual modality as a primary target of interest.

Presented by

AI Safety Poland

AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.

Hosted By

30 Went

AI