Cover Image for Reviewing - "The Circuits Analysis Research Landscape"
Cover Image for Reviewing - "The Circuits Analysis Research Landscape"
58 Went

Reviewing - "The Circuits Analysis Research Landscape"

Hosted by AI Safety Initiative - Georgia Tech & Eyas Ayesh
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Calling all interpretability enthusiasts and aspiring researchers!

We will be reviewing The Circuits Research Landscape: Results and Perspectives, a collaborative effort by researchers from Deepmind, Anthropic, Decode, EleutherAI, and Goodfire AI to summarize the state of the field. Stepan Shabalin, a core contributor to this paper, researcher at EleutherAI, and research lead at the AISI, will give a presentation of his work. 

Come enjoy some good food, great science, and even greater discussion.

Talk summary:

This talk will provide an overview of the subfield of circuits-based mechanistic interpretability - from reasons to study model behavior mechanistically to technical details of Anthropic's recent work with transcoder-based circuit tracing. It goes over the entire blog post, covering both discussion of the open source implementation (Overall Findings, Implementation Considerations) and the senior authors' pieces about the state of the field (Model Biology, Future Work). It also aims to give an understanding of the context of the blog post and what kinds of problems mechanistic interpretability can help solve. At the end of the talk, there will be a discussion about ongoing efforts in the open source mechanistic interpretability community and the most impactful ways one might contribute to the field.

Location
Van Leer Building
Van Leer Bldg (Electrical and Computer Engineering), 777 Atlantic Dr NW, Atlanta, GA 30332, USA
Room C457
58 Went