Cover Image for Bliss Reading Group - June 15
Cover Image for Bliss Reading Group - June 15
Avatar for BLISS Calendar
Presented by
BLISS Calendar
Hosted By
Registration
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

This week the BLISS Reading Group does something we rarely get to do: we read a paper with its author in the room. Lorenz Hufe, whom some of you will remember from his Mechanistic Interpretability series, joins us to present his own work, freshly accepted at ICLR 2026.

Our paper is Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP (Hufe et al., 2026).

Typographic attacks are deceptively simple: paste the word "iPod" onto an apple and a CLIP model will cheerfully call it an iPod. These text injections cause targeted misclassifications, drive malicious content generation, and can even jailbreak vision-language models. Rather than patching the symptom with finetuning, Hufe et al. go looking for the mechanism, and locate a small set of attention heads in the later layers of the vision encoder that causally extract text and pipe it to the cls token. Ablating this "typographic circuit" produces dyslexic CLIP models that shrug off the attacks (up to +22% on a typographic variant of ImageNet-100) while losing under 1% of clean accuracy, with no training required, and the fix even holds up on a medical foundation model for skin-lesion diagnosis.

When is reading the text in an image a feature, and when is it a liability? Can a targeted, interpretable intervention really compete with brute-force finetuning, and does it keep scaling as models grow? And what does it tell us about CLIP that "don't read" turns out to be a localizable, removable behaviour?

Come with the questions you'd normally just mutter at the PDF: this time the author is here to answer them. Join us for a lively discussion!

Location
Merantix AI Campus
Max-Urich-Straße 3, 13355 Berlin, Germany
Avatar for BLISS Calendar
Presented by
BLISS Calendar
Hosted By