Cover Image for Lost in Translation: The Future of Multilingual AI Evaluations - #NYTechWeek

Presented by

The world’s largest AI community. Uniting 200k+ pioneers across 100+ global forums. Building the human layer for the AI era.

Hosted By

Featured in

New York Tech Week

Lost in Translation: The Future of Multilingual AI Evaluations - #NYTechWeek

Name: Lost in Translation: The Future of Multilingual AI Evaluations - #NYTechWeek
Start: 2026-06-02T17:00:00.000-04:00
End: 2026-06-02T19:00:00.000-04:00
Location: New York, NY

The AI Collective

New York, NY

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

An invite-only evening on agentic AI evaluation across languages

As AI transitions from static chatbots to autonomous agents capable of multi-step reasoning and tool use, we've hit a critical wall: the English-Centric Evaluation Gap.

Most multilingual benchmarks are just translated English datasets — and that translation introduces noise, hallucinations, and "translationese" that quietly breaks tasks the best agents should be able to solve. The result: we're mistaking measurement errors for capability gaps.

Over drinks and a closed-door conversation with Spence Green (CEO, LILT), we'll get into the actual taxonomy of pitfalls - from instructional leakage to cultural anchor bias - and the frameworks needed to clean the yardstick before the next generation of global models is built on top of it.

This is a researcher-to-researcher conversation, not a product pitch. We're keeping the room small and the discussion technical.

What We'll Cover

The "Fluent yet Broken" Paradox: Why a translation can be grammatically perfect yet functionally flawed when tool behaviours, locale conventions, or cultural contexts are lost.
GAIA-v2-LILT: How re-auditing the GAIA benchmark recovered an average of +20.7 percentage points in measured performance, proving that current "capability gaps" are often just measurement errors.
Terminal-Bench & τ³-bench: Evaluating agentic coding and multi-turn customer support conversations in non-English environments.
Functional and Cultural Alignment: The key requirements and pitfalls when transforming English benchmarks into other languages.

Programming / Schedule

5:00 PM - Arrivals, drinks, and networking
6:30 PM - Welcome from AI Collective & Fireside discussion with Spence Green
7:00 PM - Drinks & food continue

LILT is the only AI-native multilingual solution for frontier AI data and enterprise localisation. Specialising in language-grounded alignment and multimodal evaluation, LILT provides research-grade expertise to govern AI systems at scale across 200+ languages.

The AI Collective is a community of practitioners across research and deployment advancing the frontier of AI.

Location

Please register to see the exact location of this event.

New York, NY

Presented by

The AI Collective

The world’s largest AI community. Uniting 200k+ pioneers across 100+ global forums. Building the human layer for the AI era.

Hosted By