Avatar for Squad
Presented by
Squad
39 Went

Workshop: Steering AI Models through Mechanistic Interpretability

Registration
Past Event
Welcome! To join the event, please register below.
About Event

AI is a black box, but that's changing. Labs like Anthropic publish research that push towards a glass-box scenario, where humans have full observability into the brains of our AI. This workshop covers the specifics of that research, covering intros and demos to:

  • Reverse Engineering Neural Networks (mechanistic interpretability)

  • How Anthropic/OpenAI mapped out 'features' of their models using Sparse Autoencoders

  • Steering Models in Real Time (how Anthropic/OpenAI steer their models, demonstrated with a 3B param model)

Approaches from scientific literature will be expanded upon, with the goal of breaking down the knowledge for common understanding

Location
New Stadium
83 Walnut Ave, Toronto, ON M5V 2S1, Canada
Avatar for Squad
Presented by
Squad
39 Went