Cover Image for Workshop: Steering AI Models through Mechanistic Interpretability

Presented by

Building the future of agent-native engineering. Join the Squad team for technical deep-dives, demo days, and early access to new features. Check us out at trysquad.ai

Hosted By

39 Went

ΤΝ

Featured in

Τορόντο

Workshop: Steering AI Models through Mechanistic Interpretability

Name: Workshop: Steering AI Models through Mechanistic Interpretability
Start: 2025-08-13T13:30:00.000-04:00
End: 2025-08-13T15:00:00.000-04:00
Location: New Stadium

Squad

New Stadium

Toronto, Canada

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

AI is a black box, but that's changing. Labs like Anthropic publish research that push towards a glass-box scenario, where humans have full observability into the brains of our AI. This workshop covers the specifics of that research, covering intros and demos to:

Reverse Engineering Neural Networks (mechanistic interpretability)
How Anthropic/OpenAI mapped out 'features' of their models using Sparse Autoencoders
Steering Models in Real Time (how Anthropic/OpenAI steer their models, demonstrated with a 3B param model)

Approaches from scientific literature will be expanded upon, with the goal of breaking down the knowledge for common understanding

Location