LLM Interpretability: From Mechanism to Model Improvement (NICE No.169)

Hosted by NICE AI Talk

Register to See Address

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

NICE Talk No 169 invites Zhang Hengyuan, first-year Ph.D. student at HKU Ngai Lab, to share on LLM Interpretability: From Mechanism to Model Improvement.

Why does a model produce a certain behavior? Where are capabilities stored — in which layers, modules, or representations? And can understanding these internals help us improve the model itself?

This talk covers three lines of work:

1. [Locate, Steer, and Improve] A practical survey of actionable mechanistic interpretability in LLMs — organized as a Locate → Steer → Improve pipeline.

2. NSDS — a data-free layer-wise mixed-precision quantization method driven by numerical and structural dual-sensitivity, guided by interpretability analysis.

3. ShifCon — enhances non-dominant language capabilities via shift-based contrastive learning on multilingual representation subspaces.

Core insight: Interpretability is not just about "seeing" how a model works — it can be a tool for improving it.

Speaker: Hengyuan Zhang, HKU Ngai Lab. Research focuses on LLM mechanistic interpretability. Published at ACL, NeurIPS, CVPR, EMNLP, TKDD and more.

Homepage: https://rattlesnakey.github.io/

Paper 1: https://arxiv.org/pdf/2601.14004

Paper 2: https://arxiv.org/pdf/2603.17354

Paper 3: https://arxiv.org/pdf/2410.19453

Hosted By

IA