

NICE Academy - Developing Robust and Trustworthy Foundation Models
YouTube Livestream link: https://youtube.com/live/U0rDRX7ZkYM
Modern language models are powerful but often opaque and fragile. This talk explores how to make model knowledge explicit, testable, and reliable. We examine why models hallucinate or become stale, uncover phenomena like knowledge overshadowing, and show how to diagnose and locally repair failures with minimal side effects. The talk then introduces a framework for representing knowledge as interpretable, composable “atomic skills” that enable modular reasoning and stronger generalization. Finally, we connect interpretable reasoning to real-world decision value by aligning model reasoning with downstream utility. Together, these ideas point toward more interpretable, controllable, and robust foundation models.
Speaker: Yuji Zhang (https://celestinezyj.github.io/), a postdoctoral researcher at the University of Illinois Urbana-Champaign, advised by Prof. Heng Ji and Prof. Chengxiang Zhai. She received her Ph.D. in Computer Science from Hong Kong Polytechnic University. Her research focuses on developing robust and trustworthy foundation models, with an emphasis on understanding their behavior by examining their knowledge mechanisms. Her work has been published in top-tier conferences, including ACL, EMNLP, ICLR, etc. She organized the AAAI 2025 tutorial on the knowledge lifecycle of LLMs and the ACL 2025 workshop on knowledgeable foundation models.
Host: Haolun Wu (https://haolun-wu.github.io/), a 4th-year PhD candidate at Mila & McGill and a visiting scholar at Stanford. His research interests include trustworthy AI / LLMs, information retrieval, personalization, human-AI alignment, and AI for education. He has interned at Microsoft Research, Google, and DeepMind, and his work has been deployed in the MSR Alexandria knowledge base construction and applied to Google Shopping recommendation platform. He has published in top venues across several areas (e.g., NeurIPS, ICML, ICLR, EMNLP, SIGIR, WWW, CHI, CSCW, TMLR, TKDE) and serves as a reviewer.