

Testing Self-Evaluation Bias of LLMs
Registration
Past Event
About Event
When building and testing AI agents, one practical question that arises is whether to use the same model for both the agent’s reasoning and the evaluation of its outputs. Keeping the model consistent may simplify the setup and reduce costs, but it also raises concerns about bias, over-familiarity, and inflated scores.
To better understand these trade-offs, we ran an experiment comparing how evaluations differ when the same model is used versus when evaluation is handled by a different model.
Join us to see the results and our take on the implications.