AI Evals w/ Satyapriya Krishna — Evaluating safety and trustworthiness of foundation models

Hosted by Vals AI & alphaXiv

Zoom

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

When: Thursday, Oct 30, 11:00am PT

Where: Zoom link created by alphaXiv, later uploaded to alphaXiv Youtube

Zoom link: https://stanford.zoom.us/j/95904059062?pwd=0ErKmwUCab6qBSNls8oUhmeF1pzeIo.1&from=addon

About Event

🗓 Thursday October 30th 2025 · 11AM PT
🎙 Featuring Satyapriya Krishna
💬 Casual Talk + Open Discussion

Evaluating safety and trustworthiness of foundation models
We are excited to host Satyapriya Krishna, a Research Scientist at Amazon AGI Labs who previously obtained his PhD at Harvard focusing on the trustworthiness of generative language models, to share his work on his recent frontier safety evaluation: D-REX. He will be diving deep into the D-REX benchmark, a novel suite designed to uncover the critical, underexplored risk of deceptive reasoning in LLMs. This benchmark specifically exposes the discrepancy between a model's malicious internal Chain-of-Thought and its seemingly innocuous final output, a vulnerability that bypasses current output-centric safety mechanisms. Satya will also discuss his perspectives and outlook on the LLM safety and evaluation space more broadly.

Zoom link will be shared upon registration. The talk will later be uploaded to AlphaXiv’s YouTube Channel

Hosted by: alphaXiv x Vals AI
AI Evals: join the community

Hosted By

92 Going

AI