Cover Image for Skillathon - The First Agent Skills Hackathon
Cover Image for Skillathon - The First Agent Skills Hackathon
Avatar for BenchFlow
Presented by
BenchFlow
209 Went

Skillathon - The First Agent Skills Hackathon

Register to See Address
San Francisco, California
Registration
Past Event
Welcome! To join the event, please register below.
About Event

​IMPORTANT: Building C, not B. On floor 3. Follow the signs to the right of Goody cafe!

AI agents are powerful, but they still fail at real work. They don't follow deterministic workflows and lack tacit knowledge. Agent Skills tackle this problem by injecting transferable procedural knowledge into agent context to let them tackle hard problems including coding and beyond.

We're the creator of SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks, the largest benchmark for agents skills, and Sundial, the largest registry and toolbox to build and improve skills.

Over the last month, Benchflow collected 86 tasks across 11 professional domains from 180 contributors. 80% of our task contributors are PhDs or senior professionals. Sundial gathered more than 50,000 community skills and helped dozens of teams turn their workflows into skills.

​Built on this momentum, we introduce Skillathon Episode #1. It’s the first Agent Skills Hackathon in 2026 and will be hosted in Founders, Inc.. The Skillathon is part of the growing research community around skills, and it's goal is to bring builder together to craft quality skills and tasks to evaluate them. This will help understand the best practices to make effective skills, and extending the SkillsBench benchmark with new tasks and domains contributed by practitioners who know their fields best.

Sponsors:

  • Nous Research: check out their Hermes Agent

  • Founders, Inc. and Nebula.gg

  • Sundial

  • Abundant

  • IncidentFox

  • Daytona

Speakers:

  • Belinda Mo from Sundial

  • Bence Nagy from Anthropic

  • Xiangyi Li on behalf of Nous Research

  • Furqan Rydhan, co-founder of Founders, Inc., thirdweb, Nebula

​Tracks:

  • Data track: your goal is to come up with a realistic task scenario that is complex enough that it fails the most frontier models and agents, or takes a lot of effort and a long horizon to solve. In the setting of the task, you will need to come up with an agent skill for tasks in the domain you are developing. For example, say you try to pivot tables in an excel, instead of making an atomic skill like how to make pivot tables, try to modify the anthropic's default xlsx skill or create another complete skill set. Below are the list of tracks we are hosting for this hackathon. The taxonomy is grouped by a mix of skill sets and roles in the economy.

    • Computer Science. Software engineering, machine learning, cybersecurity

    • Physical world. Robotics, manufacturing, energy, infra.

    • Professional. Healthcare, finance, office suite, insurance.

    • Natural Science. Physics, mathematics, chemistry, biology, etc.

    • OpenClaw. Design orchestrator skills that coordinate multiple modules into coherent build pipelines for game development — asset generation, character design, world building, dialogue systems, modding tools, testing harnesses, or live game operations. This track treats gaming not as a training environment, but as a rich, real-world domain for composable AI tooling.

    • examples are available on SkillsBench. The task / skill format we use is optional for the purpose of hacking. Create whatever skill you like and install to what ever agent. We recommend checking out this tweet by anthropic on updates on Skill creator: https://x.com/RLanceMartin/status/2028901056818930171

  • Continual learning track: There have been many ways to improve models or prompts like Recursive Language Model, GEPA on the in context learning layer, or RL on the model layer.

Prizes:

  • $1k for the 1st place

  • $500 for the 2nd place

  • $400 for the 3rd place

Judges:

  • Xiangyi Li

  • Belinda Mo and Florent Tavernier from Sundial

  • Roey Ben Chaim from Zenity

  • Long Yi from IncidentFox

  • Jimmy Wei from IncidentFox

  • Daniel Wang from Abundant

​Organized by:

​- Xiangyi Li:  Founder of BenchFlow, author of SkillsBench, Harbor, Terminal-Bench etc.
- Roey Ben Chaim: Staff Engineer at Zenity (ex-Microsoft), organizer of AI Tinkerers Tel Aviv.
- Belinda Mo: Founder of Sundial, previously founded Viva Translate. BS and MS Stanford. 
- Florent Tavernier: Founder of Sundial. Previously founded Self Protocol and Atelier Missor.

Location
Please register to see the exact location of this event.
San Francisco, California
Avatar for BenchFlow
Presented by
BenchFlow
209 Went