ClickConf
ClickConf is a live series of events for computer use agent developers. The inaugural topic is Bridging the capability-reliability gap: computer use in research and production and will feature speakers from OAGI (creators of Lux), and researchers in computer use in healthcare etc. Come to hear about what it takes to move beyond experimental demos and build robust, agentic systems that actually work in the real world.
Speakers:
Shahul ES, founder of Vibrant Labs & Ragas
Our benchmarks study how well agents complete long-horizon workflows across multiple applications and interfaces. We'll discuss our approach to automating the creation of post-training data and RL environments using agent world models and large-scale task/verifier mining.
Michael Wornow, Kinetic Systems & HealthAdminBench
We'll cover HealthAdminBench created in partnership with Stanford Hospital to benchmark computer-use agents on real healthcare admin tasks like EHRs, payer portals, prior auth, insurance verification.
Meng Song, Pinetree Research
Are current computer-use benchmarks measuring general capability, or just benchmark adaptation? In our experience, achieving strong performance on different benchmarks often requires benchmark-specific hacks, and the gains from those hacks frequently fail to transfer to other benchmarks, let alone to real-world scenarios. In this talk, I will connect this observation to broader machine learning themes of shortcut learning and underspecification, and discuss what this means for generalization in computer use and why evaluation needs to move beyond single-benchmark success.
Zengyi Qin, OpenAGI, OpenAGI is the creator of Lux, A computer use love letter to developers - a state-of-the-art computer use model and developer toolkit.
Eric Tse, Nen co-founder, Nen is excited to be releasing an open source project at ClickConf.
Light refreshments will be provided.
Sponsored by: Nen (getnen.ai)
