

Igor Shilov - Knowledge Localization for Capability Removal in LLMs
Large Language Models increasingly possess capabilities that carry dual-use risks. While data filtering is a common pretraining mitigation, it is expensive at scale, and even small amounts of mislabeled content can introduce dangerous capabilities. This talk explores an improved variant of Gradient Routing called Selective GradienT Masking (SGTM), with particular focus on evaluating its robustness to label noise.
Igor Shilov is a PhD researcher at Imperial College London, working on privacy and security of AI. Before starting his PhD in 2023, Igor spent over 10 years as a software engineer, most recently as a Research Engineer at Meta, where he worked on privacy-preserving machine learning with a focus on differential privacy and federated learning. His research interests include LLM memorization, adversarial attacks and security, and model internals. In 2025, Igor was part of the inaugural cohort of the Anthropic AI Safety Fellowship, working on knowledge localization in LLMs.