Skillathon - The First Agent Skills Hackathon

Name: Skillathon - The First Agent Skills Hackathon
Start: 2026-03-07T10:00:00.000-08:00
End: 2026-03-07T22:30:00.000-08:00
Location: San Francisco, California

BenchFlow

Register to See Address

San Francisco, California

Past Event

Welcome! To join the event, please register below.

You will be asked to verify token ownership with your wallet.

About Event

IMPORTANT: Building C, not B. On floor 3. Follow the signs to the right of Goody cafe!

AI agents are powerful, but they still fail at real work. They don't follow deterministic workflows and lack tacit knowledge. Agent Skills tackle this problem by injecting transferable procedural knowledge into agent context to let them tackle hard problems including coding and beyond.

We're the creator of SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks, the largest benchmark for agents skills, and Sundial, the largest registry and toolbox to build and improve skills.

Over the last month, Benchflow collected 86 tasks across 11 professional domains from 180 contributors. 80% of our task contributors are PhDs or senior professionals. Sundial gathered more than 50,000 community skills and helped dozens of teams turn their workflows into skills.

Built on this momentum, we introduce Skillathon Episode #1. It’s the first Agent Skills Hackathon in 2026 and will be hosted in Founders, Inc.. The Skillathon is part of the growing research community around skills, and it's goal is to bring builder together to craft quality skills and tasks to evaluate them. This will help understand the best practices to make effective skills, and extending the SkillsBench benchmark with new tasks and domains contributed by practitioners who know their fields best.

Sponsors:

Nous Research: check out their Hermes Agent
Founders, Inc. and Nebula.gg
Sundial
Abundant
IncidentFox
Daytona

Speakers:

Belinda Mo from Sundial
Bence Nagy from Anthropic
Xiangyi Li on behalf of Nous Research
Furqan Rydhan, co-founder of Founders, Inc., thirdweb, Nebula

Tracks:

Data track: your goal is to come up with a realistic task scenario that is complex enough that it fails the most frontier models and agents, or takes a lot of effort and a long horizon to solve. In the setting of the task, you will need to come up with an agent skill for tasks in the domain you are developing. For example, say you try to pivot tables in an excel, instead of making an atomic skill like how to make pivot tables, try to modify the anthropic's default xlsx skill or create another complete skill set. Below are the list of tracks we are hosting for this hackathon. The taxonomy is grouped by a mix of skill sets and roles in the economy.
- Computer Science. Software engineering, machine learning, cybersecurity
- Physical world. Robotics, manufacturing, energy, infra.
- Professional. Healthcare, finance, office suite, insurance.
- Natural Science. Physics, mathematics, chemistry, biology, etc.
- OpenClaw. Design orchestrator skills that coordinate multiple modules into coherent build pipelines for game development — asset generation, character design, world building, dialogue systems, modding tools, testing harnesses, or live game operations. This track treats gaming not as a training environment, but as a rich, real-world domain for composable AI tooling.
- examples are available on SkillsBench. The task / skill format we use is optional for the purpose of hacking. Create whatever skill you like and install to what ever agent. We recommend checking out this tweet by anthropic on updates on Skill creator: https://x.com/RLanceMartin/status/2028901056818930171
Continual learning track: There have been many ways to improve models or prompts like Recursive Language Model, GEPA on the in context learning layer, or RL on the model layer.
- one example available (smolclaw.com) https://x.com/xdotli/status/2030219765630071022?s=20

Prizes:

$1k for the 1st place
$500 for the 2nd place
$400 for the 3rd place

Judges:

Xiangyi Li
Belinda Mo and Florent Tavernier from Sundial
Roey Ben Chaim from Zenity
Long Yi from IncidentFox
Jimmy Wei from IncidentFox
Daniel Wang from Abundant

Organized by:

- Xiangyi Li: Founder of BenchFlow, author of SkillsBench, Harbor, Terminal-Bench etc.
- Roey Ben Chaim: Staff Engineer at Zenity (ex-Microsoft), organizer of AI Tinkerers Tel Aviv.
- Belinda Mo: Founder of Sundial, previously founded Viva Translate. BS and MS Stanford.
- Florent Tavernier: Founder of Sundial. Previously founded Self Protocol and Atelier Missor.

Location

Please register to see the exact location of this event.

San Francisco, California

Presented by

BenchFlow

Hosted By

208 Went

AI