Igor Shilov - Knowledge Localization for Capability Removal in LLMs

Privacy, Security & Policy - Cohere Labs Community

Google Meet

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

Large Language Models increasingly possess capabilities that carry dual-use risks. While data filtering is a common pretraining mitigation, it is expensive at scale, and even small amounts of mislabeled content can introduce dangerous capabilities. This talk explores an improved variant of Gradient Routing called Selective GradienT Masking (SGTM), with particular focus on evaluating its robustness to label noise.

Igor Shilov is a PhD researcher at Imperial College London, working on privacy and security of AI. Before starting his PhD in 2023, Igor spent over 10 years as a software engineer, most recently as a Research Engineer at Meta, where he worked on privacy-preserving machine learning with a focus on differential privacy and federated learning. His research interests include LLM memorization, adversarial attacks and security, and model internals. In 2025, Igor was part of the inaugural cohort of the Anthropic AI Safety Fellowship, working on knowledge localization in LLMs.

Presented by

Privacy, Security & Policy - Cohere Labs Community

Led by Damani Mguni-Coker and Manuel Villanueva. Part of the Cohere Labs Open Science initiative https://cohere.com/research/open-science

Hosted By