Cover Image for Lost in Translation: The Future of Multilingual AI Evaluations - #NYTechWeek
Cover Image for Lost in Translation: The Future of Multilingual AI Evaluations - #NYTechWeek
Avatar for The AI Collective
Presented by
The AI Collective
The world’s largest AI community. Uniting 200k+ pioneers across 100+ global forums. Building the human layer for the AI era.

Lost in Translation: The Future of Multilingual AI Evaluations - #NYTechWeek

Register to See Address
New York, NY
Registration
4 Spots Remaining
Hurry up and register before the event fills up!
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

An invite-only evening on agentic AI evaluation across languages

As AI transitions from static chatbots to autonomous agents capable of multi-step reasoning and tool use, we've hit a critical wall: the English-Centric Evaluation Gap.

Most multilingual benchmarks are just translated English datasets — and that translation introduces noise, hallucinations, and "translationese" that quietly breaks tasks the best agents should be able to solve. The result: we're mistaking measurement errors for capability gaps.

Over drinks and a closed-door conversation with Spence Green (CEO, LILT), we'll get into the actual taxonomy of pitfalls - from instructional leakage to cultural anchor bias - and the frameworks needed to clean the yardstick before the next generation of global models is built on top of it.

This is a researcher-to-researcher conversation, not a product pitch. We're keeping the room small and the discussion technical.


What We'll Cover

  • The "Fluent yet Broken" Paradox: Why a translation can be grammatically perfect yet functionally flawed when tool behaviours, locale conventions, or cultural contexts are lost.

  • GAIA-v2-LILT: How re-auditing the GAIA benchmark recovered an average of +20.7 percentage points in measured performance, proving that current "capability gaps" are often just measurement errors.

  • Terminal-Bench & τ³-bench: Evaluating agentic coding and multi-turn customer support conversations in non-English environments.

  • Functional and Cultural Alignment: The key requirements and pitfalls when transforming English benchmarks into other languages.


Programming / Schedule

  • 5:00 PM - Arrivals, drinks, and networking

  • 6:30 PM - Welcome from AI Collective & Fireside discussion with Spence Green

  • 7:00 PM - Drinks & food continue


LILT is the only AI-native multilingual solution for frontier AI data and enterprise localisation. Specialising in language-grounded alignment and multimodal evaluation, LILT provides research-grade expertise to govern AI systems at scale across 200+ languages.

The AI Collective is a community of practitioners across research and deployment advancing the frontier of AI.

Location
Please register to see the exact location of this event.
New York, NY
Avatar for The AI Collective
Presented by
The AI Collective
The world’s largest AI community. Uniting 200k+ pioneers across 100+ global forums. Building the human layer for the AI era.