Empowering domain experts with AI evals in non-profits

Sahai @Artpark, IISc

Google Meet

Approval Required

Your registration is subject to host approval.

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

“Bad UX” is the hidden problem of AI evaluation tools in non-profits today.

A lot of cool (and useful) tools (e.g. Langfuse) exist out there but they have almost zero adoption. Most of them are either too expensive, too hard to set up or simply not usable, especially by a team of non-technical people.

Teams which are technically well-staffed (a minority in the social sector) often end up (re-)creating their own evaluation suites. The rest are often left with no other option than ad-hoc vibe checks.

Calibrate was born out of the necessity to empowering domain experts: https://calibrate.artpark.ai/

It is an open-source AI agent evaluation interface built for non-profits. We are productising the best practices we have learnt over our combined experience of multiple decades (and a lot of sleepless nights).

In the past, I have built several tools but there was a key gap every time: I was never the intended user.

That changes with Calibrate.

Most of my evals now run within Calibrate, leaving the hardest parts to thinking and rigor, not implementation. A few examples:

- Evaluating the quality of agent responses across multiple LLM judges customised for every use case

- Testing whether the agent calls the right tools during a given conversation

- Finding the best LLM for every use case (across Claude, Gemini, OpenAI, DeepSeek, Qwen, etc.)

- Capturing human labels, consistency in their labels and tracking human alignment with LLM judges

- Finding the best speech-to-text provider across every language for my dataset

- Simulating conversations with my agent for different user personas before I deploy a change

You can connect your existing agent and create your first eval in 3 simple steps or less.

Jigar and I will conduct our first community webinar on evaluating AI agents for non-profits next week.

We will set up evals for one use case, identify where our agent fails and iteratively improve it. Live in front of you, while answering any questions you might have - related or unrelated to whatever we show.

Come join us. It will be fun!

If nothing else, you will at least find honesty.

Presented by

Sahai @Artpark, IISc

Hosted By

49 Going

एआई