Cover Image for AI Safety Poland Talks #11
Cover Image for AI Safety Poland Talks #11
Avatar for AI Safety Poland
Presented by
AI Safety Poland
AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.
30 Went

AI Safety Poland Talks #11

Google Meet
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Welcome to AI Safety Poland Talks!

​A biweekly series where researchers, professionals, and enthusiasts from Poland or connected to the Polish AI community share their work on AI Safety.

💁 Topic: Jailbreaking Vision-Language Models Through the Visual Modality
📣 Speaker: Jan Dubiński
🇬🇧 Language: English
🗓️ Date: 02.04.2026, 18:00
📍 Location: Online

Speaker Bio
Jan Dubiński is a researcher at the NASK National Research Institute and Warsaw University of Technology. His work focuses on the safety of generative models, including both language and vision systems. He is currently collaborating with Owain Evans’s Truthful AI as part of the Constellation Astra Fellowship. Jan’s past research work includes MARS Fellowship at the Cambridge AI Safety Hub, internships at the CISPA Helmholtz Center for Information Security and Sapienza University of Rome, together with collaboration with CERN.

Abstract
Vision-language models (VLMs) combine image understanding with language reasoning. They process visual and textual inputs together to generate a response. While much progress has been made on aligning language models against text-based misuse, the visual side of VLMs remains less explored as a safety risk. We show that harmful intent can be concealed in visually benign inputs and still be recovered by VLMs through context, structure, and inference. For example, in context-rich images, a harmful object such as a bomb can be replaced with a benign substitute like a banana. The VLM treats references to the benign term as referring to the harmful object, which leads it to comply with dangerous requests such as bomb-making instructions. Our results show that VLM safety cannot be treated as just an extension of text-only alignment. Robust safety measures must also treat the visual modality as a primary target of interest.

Avatar for AI Safety Poland
Presented by
AI Safety Poland
AI Safety Poland is a community in Poland dedicated to reducing the risks posed by artificial intelligence.
30 Went