Cover Image for CoAct-1: Blending Action and Intelligence
Cover Image for CoAct-1: Blending Action and Intelligence
Avatar for AG2 (formerly AutoGen)
The Open-Source AgentOS. Join our community of +20k agent builders: https://discord.gg/pAbnFJrkgZ
29 Going

CoAct-1: Blending Action and Intelligence

Google Meet
Registration
Welcome! To join the event, please register below.
About Event

Title: CoAct-1: Blending Action and Intelligence

Abstract: This talk covers CoAct-1, a novel multi-agent system that enhances computer automation by treating coding as a core action alongside traditional GUI manipulation implemented by AG2. The system is designed to overcome the inefficiency and brittleness of GUI-only agents on complex, long-horizon tasks. CoAct-1 uses an Orchestrator to decompose tasks and delegate them to either a GUI Operator for visual interactions or a Programmer agent that can write and execute Python or Bash scripts for backend operations like file management or data processing. This hybrid approach allows the agent to bypass inefficient GUI sequences in favor of more robust code execution. The paper shows that on the OSWorld benchmark, CoAct-1 achieves a new state-of-the-art success rate of 60.76% and significantly reduces the average number of steps needed to complete tasks compared to leading GUI-only agents.

Speaker Bio: Linxin Song is a second-year computer science PhD student at University of Southern California (USC) advised by Professor Jieyu Zhao. His research interests center on natural language processing (NLP) and synthetic data: specifically, evaluating large language and vision-language models across domains, extending their capabilities with minimal cost, and enabling their safe, efficient, and effective collaboration in real-world problems. 

Date & Time: 
Oct 30, 9am PST

Avatar for AG2 (formerly AutoGen)
The Open-Source AgentOS. Join our community of +20k agent builders: https://discord.gg/pAbnFJrkgZ
29 Going