Cover Image for Hack your way into LLMs
Cover Image for Hack your way into LLMs
6 Going
Registration
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

Can you read a model's mind before it writes the exploit?

SAIL is hosting a mechanistic interpretability hackathon at EPFL. A curated adversarial dataset. Five coding models. One question: what's actually happening inside when they decide to help an attacker?

Two tracks:

  • Probes as monitors — classify internal activations to detect harmful compliance before generation lands

  • Attribution — pinpoint which inputs actually push a model toward an attacker

What you get: Claude Code credits, GPU instances, mentors, and food all weekend. A real shot at a publishable result in 48 hours.

Prizes: CHF 1,000 · CHF 500 · CHF 300

PyTorch fluency + curiosity about model internals. That's the bar.

Register by May 6th

May 9–10 · Teams of 5 · EPFL Hosted by SAIL × MLO Lab.

Location
EPFL
1015 Lausanne, Switzerland
6 Going