

Skillathon - The First Agent Skills Hackathon
AI agents are powerful -- but they still fail at real work. Agent Skills is the solution to inject transferable procedural knowledge for agents to tackle computer works including coding and beyond. We previously created SkillsBench: SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks. Over the last month, we collected 86 tasks across 11 professional domains from 180 contributors. 80% of our task contributors are PhDs or senior professionals.
Built on the momentum of SkillsBench, we introduce Skillathon Episode #1. It’s the first Agent Skills Hackathon in 2026 and will be hosted in Founders, Inc.. Skillathon is part of the growing research community around SkillsBench, extending the benchmark with new tasks and domains contributed by practitioners who know their fields best. SkillsBench is in the Harbor format, and we will use that for evaluations and tasks in this hackathon as well.
Tracks:
Data track: your goal is to come up with a realistic task scenario that is complex enough that it fails the most frontier models and agents, or takes a lot of effort and a long horizon to solve. In the setting of the task, you will need to come up with an agent skill for tasks in the domain you are developing. For example, say you try to pivot tables in an excel, instead of making an atomic skill like how to make pivot tables, try to modify the anthropic's default xlsx skill or create another complete skill set. Below are the list of tracks we are hosting for this hackathon. The taxonomy is grouped by a mix of skill sets and roles in the economy.
Computer Science. Software engineering, machine learning, cybersecurity
Physical world. Robotics, manufacturing, energy, infra.
Professional. Healthcare, finance, office suite, insurance.
Natural Science. Physics, mathematics, chemistry, biology, etc.
Continual learning track: There have been many ways to improve models or prompts like Recursive Language Model, GEPA on the in context learning layer, or RL on the model layer. We will select 50 tasks from SkillsBench (to make it fast to execute while maintaining statistical significance). Your task is to come up with a better way to iteratively improve the skills. We will evaluate the method by applying the method to the Original SkillsBench and see if it can improve the score as well as add Skills to other benchmarks like Terminal Bench.
Organized by:
- Xiangyi Li: Founder of BenchFlow, author of SkillsBench, Harbor, Terminal-Bench etc.
- Roey Ben Chaim: Staff Engineer at Zenity (ex-Microsoft), organizer of AI Tinkerers Tel Aviv
- Belinda Mo: Founder of Sundial, previously founded Viva Translate. BS and MS Stanford.
- Florent Tavernier: Founder of Sundial. Previously founded SelfProtocol and Atelier Missor
- Grace Zhang: Founder of World Intelligence, multimodal data infrastructure for physical AI. Host of Physical AI Hack