ProCUA-SFT: Addressing Negative Transfer in Computer-Use Agent Training

Researchers introduce ProCUA-SFT, a large-scale dataset designed to overcome the performance degradation observed when fine-tuning Computer-Use Agents (CUAs) on existing public trajectory datasets.

The Challenge of Computer-Use Agents (CUAs)

Computer-Use Agents (CUAs) are specialized AI models designed to interact with graphical user interfaces (GUIs) by processing screenshots and executing keyboard and mouse actions. To achieve proficiency, these models require vast amounts of diverse trajectory data collected from authentic desktop environments. However, the availability of high-quality, large-scale training data remains a significant bottleneck in the development of robust agents.

The Phenomenon of Negative Transfer

The technical report highlights a critical issue regarding the current state of public resources. Using AgentNet—the largest public resource with 22.5K human trajectories—for Supervised Fine-Tuning (SFT) unexpectedly led to "negative transfer." Specifically, when the UI-TARS 7B model was further trained on AgentNet, its success rate on the OSWorld benchmark plummeted from 26.3% to between 8% and 10%.

Introducing ProCUA-SFT

To mitigate this performance drop and provide a more scalable training foundation, the authors present ProCUA-SFT. This new dataset consists of 3.1 million synthetic trajectories, providing the scale and diversity necessary to stabilize and improve the training of models interacting with complex desktop environments.

Note: The provided source text was truncated; further details regarding the specific methodology of the 3.1M trajectories and the resulting benchmark improvements are not available in the provided snippet.

Original Source
Computer-Use Agents SFT Negative Transfer UI-TARS AgentNet OSWorld