Daytona AI Researchers - Berkeley, May 2026
Welcome to our first meetup dedicated to AI researchers at University of California, Berkeley!
Agenda
🕒 5:00 pm – 5:05 pm
Welcome and Opening Remarks
🕒 5:05 pm – 5:20 pm
Talk "Today's Agents Don't Live In Episodes"
🎤 Muhammad Annas Hashmi, DevRel at Daytona
Outline:
The 'episode' (short, stateless, resettable) has been RL's foundational abstraction since ATARI. It underpins the Gym API, GRPO, PPO, and the conventional sandbox lifecycle. Today's agents no longer fit it. Tasks span for days; the env state at hour 18 of an agent session with warm caches, installed deps, live processes, open sockets, dirty git tree, is worth hours of wall clock to reproduce.
Three things are scaling simultaneously. Rollout horizon: seconds -> days. Env state: disposable between episodes -> first-class learning substrate. Branching: absent in modern LLM-RL -> speculative fork trees. Each stresses the inherited toolkit in a different way, and all three have been gated on the same missing primitives: VMs you can fork cheaply, pause without killing processes, snapshot mid-run, and resume hours later.
This talk walks through what opens up when those primitives become available. Live demo of long-horizon sessionful rollouts, mid-trajectory forking, and cross-calendar-time training. The research questions that follow (long-horizon benchmarks, speculative RL algorithms, event-driven training, to name a few) are where the next wave of agent RL gets built.
🕒 5:20 pm – 5:30 pm
Talk "Hive: A Platform for Collaborative Agent Evolution"
🎤 Sijun Tan, PhD researcher at UC Berkeley’s Sky Computing Lab
Outline:
We introduce Hive, a platform for multi-agent, collaborative evolution of shared artifacts. Agents on Hive iteratively read prior runs, select branches, propose modifications, and submit results, with coordination emerging from versioned artifacts, benchmark-driven selection, and shared visibility. This induces a form of distributed evolutionary search over programs, where promising directions attract further iteration and weaker ones are abandoned. We observe rapid, compounding gains (e.g., 45% → 77% on Tau2-Bench, 25% to 53% on BabyVision-Lite) driven by many incremental contributions, positioning Hive as a step toward scalable collective intelligence and recursive self-improvement.
🕒 5:30 pm – 5:40 pm
Talk "SkyRL: A Modular RL Library for LLMs"
🎤 Charlie Ruan, PhD researcher at UC Berkeley’s Sky Computing Lab
Outline:
In this talk, we’ll dive into SkyRL’s core design principles and recent efforts to “Tinker-ify” the framework—standardizing on a minimal, shared API that unlocks open-source network effects while pushing infrastructure concerns below the algorithm layer. We’ll also cover SkyRL’s integration with Harbor, which enables RL for terminal-use–style tasks while letting users focus primarily on the data recipe rather than framework internals.
🕒 5:40 pm – 5:50 pm
Talk "Frontier CS: Evolving Challenges for Evolving Intelligence "
🎤 Hanchen Li, PhD researcher at UC Berkeley’s Sky Computing Lab
Outline:
As continuously evolving agents become the de facto standard for autonomous systems like Claude Code, the need to rigorously evaluate their advanced problem-solving capabilities has grown critical. To address this, we introduce FrontierCS, a benchmark of 244 open-ended computer science problems curated by domain experts, including CS PhDs and elite competitive programmers.
Unlike existing benchmarks focused on tasks with known optimal answers, FrontierCS targets complex challenges—such as NP-hard algorithmic variants and open research questions—where the absolute optimal solution remains unknown, yet solution quality can be objectively measured via partial scoring. Models must solve these tasks by implementing executable programs rather than generating direct textual answers. Each problem includes an expert reference solution and an automated evaluator.
Empirically, we demonstrate that frontier reasoning models still lag significantly behind human experts, and simply scaling reasoning budgets fails to close this gap on open-ended tasks. By combining open-ended design with measurable progress, FrontierCS provides a vital new lens for evaluating the true limits of modern AI systems on authentic, frontier-level computer science problems.
🕒 5:50 pm – 6:00 pm
Talk "TBA"
🎤 Speaker TBA
Outline:
TBA
🕒 5:50 pm - 8:00 pm
Networking
With food and beverages
________________________
About event
An engaging meetup designed for AI researchers to connect, share ideas, and explore the latest advancements in artificial intelligence. The event features informal networking, short talks, and discussions on current research trends, fostering collaboration and knowledge exchange within the AI community.
