Computer-Use Agents as Judges for Generative User Interface

📖 TL;DR

What does Agent-friendly look like? Check out below demo:

The left UI is designed for 🧑🏻‍💻humans—prioritizing aesthetics.

The right UI is redesigned for 🤖agents—focused on clarity and functionality.

Can Computer-Use Agents offer feedback to assist Coders to Generate UI?

👨‍💻Humans Collaboration vs. 🤖️Coder-CUA Collaboration.
Left: Most GUIs are designed by humans and optimized for user experience (e.g. aesthetics), forcing trained agents to adapt to human-oriented behaviors. Right: Our Coder-CUA Collaboration framework leverages Coder as Designer and CUA as Judge together, enabling more reliable task execution and improved usability for agents.
This is AUI, Agent-friendly UI.

🔧 How does it work?

Overview of the Coder-CUA in Collaboration framework.
The process begins with the Coder as Designer, which initializes and iteratively revises the UI based on queries and feedback. In parallel, the CUA as Judge executes task-driven navigation within the testing environment, generating trajectories and error logs to evaluate task solvability. A verifier ensures functional correctness, while feedback from CUA navigation informs subsequent UI revisions. This collaboration yields a finalized agent-centric UI optimized for both functionality and execution success.

📊 New UI improve Success Rate

We evaluate our framework using Function Completeness Rate (FC) and CUA Success Rate (SR). The results demonstrate that our Coder-CUA collaboration significantly improves both metrics compared to the baseline, especially for stronger models like GPT-5 and Gemini-3-Pro.

Coder	Method	Overall Performance
Coder	Method	Func. Completeness (%)	CUA Success Rate (%)
GPT-5	Baseline	67.9	24.5
GPT-5	+ Ours	81.5	26.0
Qwen3-Coder-30B	Baseline	42.1	7.3
Qwen3-Coder-30B	+ Ours	60.1	19.0
GPT-4o	Baseline	36.3	8.8
GPT-4o	+ Ours	43.1	16.1
Gemini-3-Pro	Baseline	71.7	35.8
Gemini-3-Pro	+ Ours	72.5	47.0

@misc{lin2025aui, title={Computer-Use Agents as Judges for Generative User Interface}, author={Kevin Qinghong Lin and Siyuan Hu and Linjie Li and Zhengyuan Yang and Lijuan Wang and Philip Torr and Mike Zheng Shou}, year={2025}, eprint={2511.15567}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2511.15567}, }

Computer-Use Agents as Judges for Generative User Interface

📖 TL;DR

🔧 How does it work?

🎬 Qualitative Examples

📊 New UI improve Success Rate

🎓 Citation