Cover Image for AI Evals w/ Satyapriya Krishna — Evaluating safety and trustworthiness of foundation models
Cover Image for AI Evals w/ Satyapriya Krishna — Evaluating safety and trustworthiness of foundation models
92 Going

AI Evals w/ Satyapriya Krishna — Evaluating safety and trustworthiness of foundation models

Hosted by Vals AI & alphaXiv
Zoom
Registration
Welcome! To join the event, please register below.
About Event

When: Thursday, Oct 30, 11:00am PT

Where: Zoom link created by alphaXiv, later uploaded to alphaXiv Youtube

Zoom link: https://stanford.zoom.us/j/95904059062?pwd=0ErKmwUCab6qBSNls8oUhmeF1pzeIo.1&from=addon

About Event

🗓 Thursday October 30th 2025 · 11AM PT
🎙 Featuring Satyapriya Krishna
💬 Casual Talk + Open Discussion

Evaluating safety and trustworthiness of foundation models
We are excited to host Satyapriya Krishna, a Research Scientist at Amazon AGI Labs who previously obtained his PhD at Harvard focusing on the trustworthiness of generative language models, to share his work on his recent frontier safety evaluation: D-REX. He will be diving deep into the D-REX benchmark, a novel suite designed to uncover the critical, underexplored risk of deceptive reasoning in LLMs. This benchmark specifically exposes the discrepancy between a model's malicious internal Chain-of-Thought and its seemingly innocuous final output, a vulnerability that bypasses current output-centric safety mechanisms. Satya will also discuss his perspectives and outlook on the LLM safety and evaluation space more broadly.

Zoom link will be shared upon registration. The talk will later be uploaded to AlphaXiv’s YouTube Channel


Hosted by: alphaXiv x Vals AI
​​​​​​AI Evals: join the community

92 Going