Cover Image for Break Frontier AI — In Your Language
Cover Image for Break Frontier AI — In Your Language
Avatar for The AI Collective
Presented by
The AI Collective
The world’s largest AI community. Uniting 200k+ pioneers across 100+ global forums. Building the human layer for the AI era.
12 Going

Break Frontier AI — In Your Language

Virtual
Registration
Approval Required
Your registration is subject to host approval.
Welcome! To join the event, please register below.
About Event

LILTBench Hackathon (Hosted by LILT × The AI Collective)

Registration

Approval Required

Your registration is subject to host approval.

Welcome! To join the event, please register below.

About Event

Can you design a coding task that breaks the world’s best AI models — in your language?

While frontier LLMs demonstrate high proficiency in English-centric benchmarks, their capabilities often degrade sharply when processing complex instructions, nuances, or data in other languages.

LILTBench invites applied AI researchers and evaluation engineers to identify, formalize, and benchmark these cross-lingual vulnerabilities. Your objective is to design rigorous evaluation tasks that expose systematic non-English performance gaps in today’s most advanced models.

The Evaluation Architecture

To ensure scientific rigor, all submissions will undergo automated evaluation via a production-grade benchmarking pipeline:

  • Target Model: Claude Opus 4.6

  • Framework: Terminal-Bench

  • Agent Harness: Terminus 2

  • Execution: Accepted tasks are run through 15 deterministic iterations to map exact pass/fail boundaries.


📅 Schedule

  • June 15 (Mon) — Kickoff webinar: rules, workflow, evaluation rubric, live demo. Repo made public

  • June 15–21 — Hackathon week: design, develop, test, and submit tasks

  • June 21 (Sun) 11:59 PM UTCCode freeze (all PRs must be passing CI by this deadline)

  • June 22–23 — Evaluation: accepted tasks run against Claude Opus 4.6 (15 iterations each)

  • June 24 (Tue) — Awards webinar: winners announced, top tasks showcased


🏆 Prizes & Recognition

Beyond contributing to the advancement of multilingual AI safety and evaluation, top performers will receive:

  • Global Visibility: The top 5 winners will be featured prominently in the AI Collective Newsletter and across corporate channels.

  • Cash Prizes: Tiered awards up to $1,500 for 1st place ($1,000 for 2nd, $500 for 3rd, etc.).

Scoring Rubric

Points are weighted heavily by task difficulty—we value profound, formalized edge cases over volume.

  • Easy (13–15 passes out of 15): 1 point

  • Medium (9–12 passes): 2 points

  • Hard (4–8 passes): 4 points

  • Very hard (0–3 passes): 8 points

💡 Note: Quality over quantity. A single "Very Hard" task (8 points) nets a higher score than four "Easy" tasks (4 points).


Location

Virtual

(Exact joining details provided after registration approval.)


For detailed information about submission and evaluation, please visit this notion page: https://www.notion.so/lilt/LILTBench-Hackathon-361c66a75a508039bf00c9303a85ed3b

Avatar for The AI Collective
Presented by
The AI Collective
The world’s largest AI community. Uniting 200k+ pioneers across 100+ global forums. Building the human layer for the AI era.
12 Going