AI Evals w/ Satyapriya Krishna — Evaluating safety and trustworthiness of foundation models
When: Thursday, Oct 30, 11:00am PT
Where: Zoom link created by alphaXiv, later uploaded to alphaXiv Youtube
Zoom link: https://stanford.zoom.us/j/95904059062?pwd=0ErKmwUCab6qBSNls8oUhmeF1pzeIo.1&from=addon
About Event
🗓 Thursday October 30th 2025 · 11AM PT
🎙 Featuring Satyapriya Krishna
💬 Casual Talk + Open Discussion
Evaluating safety and trustworthiness of foundation models
We are excited to host Satyapriya Krishna, a Research Scientist at Amazon AGI Labs who previously obtained his PhD at Harvard focusing on the trustworthiness of generative language models, to share his work on his recent frontier safety evaluation: D-REX. He will be diving deep into the D-REX benchmark, a novel suite designed to uncover the critical, underexplored risk of deceptive reasoning in LLMs. This benchmark specifically exposes the discrepancy between a model's malicious internal Chain-of-Thought and its seemingly innocuous final output, a vulnerability that bypasses current output-centric safety mechanisms. Satya will also discuss his perspectives and outlook on the LLM safety and evaluation space more broadly.
Zoom link will be shared upon registration. The talk will later be uploaded to AlphaXiv’s YouTube Channel
Hosted by: alphaXiv x Vals AI
AI Evals: join the community
