Generative AI-focused workshops, hackathons, and more. Come build with us!

Arize AI

About a year ago, Arize AI released some early research on how reliable foundational models were at LLM-as-a-Judge when the output was a binary vs score eval. The results were very clear - binary evals were the way to go. It’s been over a year and the models are getting better. Does the research still hold? 

In this session, Elizabeth Hutton (Senior AI Engineer) and Srilakshmi Chavali (AI Engineer) will dive into findings from newly released research!

How To Navigate Binary vs Score Evals

William Chen

Dr Veena Mendiratta

Patrick Mitchell

Nadav Shanun

Parth

Jai Chandnani

Vignesh Ramesh

Alex Ramazanovich

Glenn Bostoen

Rajat Chadda

David Burch

Catherine Pradhan

Elizabeth

Sri Chavali

Standard