Roundtable Dinner: AI Benchmarking Across Languages hosted by AI Circle & LILT

LILT

London, United Kingdom

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

A Deep Dive with Joern and LILT’s Applied AI Research Team

As AI transitions from static chatbots to autonomous agents capable of multi-step reasoning and tool use, we have hit a critical wall: the English-Centric Evaluation Gap. Most multilingual benchmarks today are built on "translated" versions of English datasets—a process that introduces noise, hallucinations, and "translationese" that makes tasks impossible for even the most capable agents to solve.

In this session, Joern will lead a technical discussion on the AI Benchmarking across languages and how LILT is redefining what it means to benchmark agentic performance at the frontier.

What We’ll Discuss:

The "Fluent yet Broken" Paradox: Why a translation can be grammatically perfect yet functionally flawed if tool behaviors, locale conventions, or cultural contexts are lost.
GAIA-v2-LILT: A breakdown of how re-auditing the GAIA benchmark recovered an average of +20.7 percentage points in measured performance—proving that current "capability gaps" are often just measurement errors.
Terminal-Bench & tau(3)-bench: Evaluating agentic coding and multi-turn customer support conversations in non-English environments.
Functional and Cultural Alignment: What are the key requirements and pitfalls when transforming English benchmarks into other languages?

The Experience

A Multi-Course Intellectual Tasting

We are pairing a Michelin-starred Mexican dinner with a structured technical "Engagement."

5:30 PM The Warm Up - Cocktails, arrivals, and networking.
6:00 PM The Thesis - Opening note AI Benchmarking
6:15 PM The Engagement A curated roundtable. "Bouncers" will be served with each course to drive deep-dive debate.
9:00 PM The Commitment Closing remarks and the path toward Frontier Model safety.

About LILT:
LILT is the only AI-native multilingual solution for frontier AI data and enterprise localisation. We help make your data and content multilingual—faster, more accurately, securely, and at scale. Specialising in language-grounded alignment and multimodal evaluation, we provide research-grade expertise to govern AI systems. Unlike crowdsourced options, our curated expert network and continuous quality calibration provides high-fidelity signals to build reliable models ready for global deployment.

Location

Please register to see the exact location of this event.

London, United Kingdom

Presented by

LILT

Hosted By

Roundtable Dinner: AI Benchmarking Across Languages hosted by AI Circle & LILT

​The Experience

The Experience