Cover Image for Break Frontier AI — In Your Language

Presented by

The world’s largest AI community. Uniting 200k+ pioneers across 100+ global forums. Building the human layer for the AI era.

Hosted By

182 Went

AI

Break Frontier AI — In Your Language

The AI Collective

Virtual

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

LILTBench Hackathon (Hosted by LILT × The AI Collective)

Registration

Approval Required

Your registration is subject to host approval.

Welcome! To join the event, please register below.

About Event

Can you design a coding task that breaks the world’s best AI models — in your language?

While frontier LLMs demonstrate high proficiency in English-centric benchmarks, their capabilities often degrade sharply when processing complex instructions, nuances, or data in other languages.

LILTBench invites applied AI researchers and evaluation engineers to identify, formalize, and benchmark these cross-lingual vulnerabilities. Your objective is to design rigorous evaluation tasks that expose systematic non-English performance gaps in today’s most advanced models.

The Evaluation Architecture

To ensure scientific rigor, all submissions will undergo automated evaluation via a production-grade benchmarking pipeline:

Target Model: Claude Opus 4.6
Framework: Terminal-Bench
Agent Harness: Terminus 2
Execution: Accepted tasks are run through 15 deterministic iterations to map exact pass/fail boundaries.

📅 Schedule

June 15 (Mon) — Kickoff webinar: rules, workflow, evaluation rubric, live demo. Repo made public
June 15–21 — Hackathon week: design, develop, test, and submit tasks
June 21 (Sun) 11:59 PM UTC — Code freeze (all PRs must be passing CI by this deadline)
June 22–23 — Evaluation: accepted tasks run against Claude Opus 4.6 (15 iterations each)
June 24 (Tue) — Awards webinar: winners announced, top tasks showcased

🏆 Prizes & Recognition

Beyond contributing to the advancement of multilingual AI safety and evaluation, top performers will receive:

Global Visibility: The top 5 winners will be featured prominently in the AI Collective Newsletter and across corporate channels.
Cash Prizes: Tiered awards up to $1,500 for 1st place ($1,000 for 2nd, $500 for 3rd, etc.).

Scoring Rubric

Points are weighted heavily by task difficulty—we value profound, formalized edge cases over volume.

Easy (13–15 passes out of 15): 1 point
Medium (9–12 passes): 2 points
Hard (4–8 passes): 4 points
Very hard (0–3 passes): 8 points

💡 Note: Quality over quantity. A single "Very Hard" task (8 points) nets a higher score than four "Easy" tasks (4 points).

Location

Virtual

(Exact joining details provided after registration approval.)

For detailed information about submission and evaluation, please visit this notion page: https://www.notion.so/lilt/LILTBench-Hackathon-361c66a75a508039bf00c9303a85ed3b