AgentHijack Benchmark Exposes Fragility of Computer-Use AI Agents to Environment Corruptions

2026年5月26日 · 20 次浏览 · AgentHijack ICML 2026 computer-use agents AI safety benchmark

The Hidden Vulnerability in Autonomous Computer Control

The race to build AI agents that can control computer interfaces — from clicking buttons to filling forms — has accelerated dramatically in 2025 and 2026. Companies like Anthropic (with Claude's Computer Use), OpenAI (Operator), and numerous startups have released prototypes that promise to automate tedious digital tasks. But a sobering new benchmark, AgentHijack, accepted at ICML 2026, reveals that these agents are alarmingly brittle under even minor environment corruptions. The paper, authored by Jingwei Sun, Jianing Zhu, Yuanyi Li, Tongliang Liu, Xia Hu, and Bo Han, systematically evaluates how computer-use agents handle common disruptions such as window resizing, screen occlusions, icon rearrangement, and display noise. The results paint a stark picture: most agents break down completely, raising serious questions about their reliability for unsupervised deployment.

What Is AgentHijack?

AgentHijack is a benchmark suite designed to measure the robustness of AI agents that interact with graphical user interfaces (GUIs) via pixel-level input and output. Unlike existing benchmarks that focus on task completion in pristine environments, AgentHijack introduces a taxonomy of 50 common environment corruptions grouped into categories: visual noise (e.g., static overlay, brightness shifts), layout changes (e.g., window resize, scrolling), state disruptions (e.g., modal dialogs, network latency), and adversarial perturbations (e.g., subtle pixel modifications). For each corruption, the benchmark measures task success rate, number of steps, and recovery time. The authors tested four representative agent architectures, including two commercial endpoints, and found that success rates dropped from an average of 72% in clean settings to below 15% under moderate corruption levels. Notably, even simple layout shifts like a 10% window resize caused over 40% of agents to fail entirely.

Key Findings: Brittle by Design

The paper's most striking finding is that current computer-use agents lack any form of environment awareness or error recovery mechanism. They operate under an implicit assumption of a static, predictable interface — a luxury the real world rarely affords. When faced with a dialog box that appears unexpectedly, agents repeatedly try the same action instead of adapting, often triggering cascading failures. One experiment showed that a single extra pop-up window caused the agent to fail on 8 out of 10 subsequent tasks. Another experiment tested resilience to screen resolution changes: agents trained on 1920x1080 displays dropped to 23% accuracy when tested on 1366x768, even though the underlying application remained identical. The authors note that no existing agent incorporates any form of invariance to scaling or aspect ratio changes — a capability that is trivial for human users.

Why This Matters for AI Deployment

The brittleness exposed by AgentHijack is not just an academic curiosity. Enterprise automation platforms are beginning to deploy computer-use agents for tasks like data entry, invoice processing, and customer support. If a simple window resize can derail an agent, the risk of costly mistakes — or security vulnerabilities — is substantial. The authors explicitly warn that current agents could be easily hijacked by an attacker who subtly alters the rendering of a webpage or application. For example, hiding a 'Cancel' button behind an invisible overlay could cause an agent to authorize a transaction it was meant to deny. In a companion paper on the arXiv (Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures), the same vulnerability space is explored for always-on assistants, confirming that the problem is widespread. The AgentHijack benchmark thus serves as a wake-up call for the research community to prioritize robustness as a first-class requirement, not an afterthought.

Toward More Resilient Agents

The paper does not just diagnose problems; it also suggests remedies. The authors propose three avenues for improvement: (1) invariant representation learning to make agents insensitive to common visual distortions; (2) adaptive action spaces that can detect when a planned action fails and switch to recovery behaviors; and (3) environment state monitoring to explicitly model UI changes. They implement a simple baseline that uses a pre-trained vision encoder with data augmentation and a fallback policy, which restores success rates to 58% under moderate corruption — a significant gain but still far from the 90%+ reliability needed for production. The benchmark is open-sourced at the paper's project page, allowing the community to test and improve their own agents. Given the rapid adoption of autonomous computer-use agents, AgentHijack is likely to become a standard evaluation suite, much like adversarial robustness benchmarks for image classifiers. The next year will determine whether the field can build agents that are not only capable, but truly robust.

Source: arXiv AI

345tool Editorial Team

We are a team of AI technology enthusiasts and researchers dedicated to discovering, testing, and reviewing the latest AI tools to help users find the right solutions for their needs.

我们是一支由 AI 技术爱好者和研究人员组成的团队，致力于发现、测试和评测最新的 AI 工具，帮助用户找到最适合自己的解决方案。

Loading comments...

The Hidden Vulnerability in Autonomous Computer Control

What Is AgentHijack?

Key Findings: Brittle by Design

Why This Matters for AI Deployment

Toward More Resilient Agents

评论