Intel Labs Open Sources Adversarial Image Injection to Evaluate Risks in Computer-Use AI Agents

Cory_Cornelius · ‎07-28-2025

Cory Cornelius, Marius Arvinte, Sebastian Szyller, Weilin Xu, and Nageen Himayat are on the Trusted & Distributed Intelligence team for Security and Privacy Research at Intel Labs.

Highlights

Adversarial examples can force computer-use artificial intelligence (AI) agents to execute arbitrary code.
To aid AI researchers in evaluating robustness of agentic models, Intel Labs researchers open sourced an adversarial image injection proof of concept (PoC) against computer-use AI agents such as UI-TARS.
Ultimately, a security guardrail is required before giving AI agents control over computers.

Intel Labs researchers demonstrated at Intel Vision 2025 that adversaries can force computer-use AI agents to execute arbitrary commands by displaying an adversarial image on the screen. By augmenting the image with a perturbation that is imperceivable to the human eye, the adversarial code can mislead the AI agent. This fundamental deep learning problem has remained unsolved after more than a decade of research. To help the research community to validate the issue and develop countermeasures against attacks on computer-use AI agents such as the open source UI-TARS, Intel Labs has open sourced an adversarial image injection proof-of-concept (PoC) using two libraries — the Large Language Model Adversarial Robustness Toolkit (LLMart) and the Modular Adversarial Robustness Toolkit (MART).

Finding ways to improve the robustness of AI models is critical as reasoning models become more advanced. Computer-use AI agents can interpret and interact with visual elements on a screen, allowing agents to perform tasks on behalf of users, such as navigating applications to book airline tickets or look up weather in a vacation destination. Using this autonomous decision-making function to their advantage, motivated malicious actors can create adversarial examples to distract AI agents from executing instructions faithfully. When evaluating the robustness of computer-use AI agents, LLMart provides a convenient pipeline for computing image input gradients, while MART composes perturbations in the image. By combining the tools, researchers can optimize perturbations for testing agentic AI system robustness.

The Target: UI-TARS Computer-Use AI Agent

Computer-use AI agents, such as the open-source UI-TARS, monitor the computer screen and follow user instructions by controlling the keyboard and mouse to perform the requested task. For example, when a user prompts “What’s the current weather in Portland, OR,” UI-TARS might open a browser, navigate to a weather information website, look up by the city name, and display the weather information on the screen. Currently, users can run the quantized 2B and 7B variants of UI-TARS locally on Intel AI PCs, enabling privacy-sensitive uses of these agents.

Attacking UI-TARS with Adversarial Images

The input to UI-TARS consists of a text prompt and a screenshot image. As a result, the AI agent is susceptible not only to prompt injection attacks but also adversarial examples in the image modality. A textual prompt injection attack is not a realistic threat in this scenario because the adversary needs to type the prompt on the target’s computer, which would mean the bad actor already has direct access to the computer, according to IEEE Spectrum.

However, an adversary may control a website, which could display a malicious image on the target user’s screen. The adversary may pay to advertise the malicious image on a website or simply send the image in an email. Intel Labs researchers found that these adversarial images can force the AI agent to perform any task specified by the attacker, regardless of user prompts.

In the PoC, the team demonstrates that adversaries are capable of forcing UI-TARS to print a message from the command line “You have been hacked!”, even though the user asks for the weather information in the text prompt (see Figure 1). Caused by an adversarial image shown on the screen, this deviation is effective regardless of the user’s instruction. The PoC exploits the fact that the agent will execute multiple commands.

Figure 1. The UI-TARS-7B computer-use AI agent prints a message “You have been hacked!” when asked about the weather in the screenshot. The adversarial perturbation injected into the image is optimized by gradient descent by the Intel Labs PoC.

Implementation Details: LLMart and MART

Available at GitHub, the PoC source code is built on Intel Labs’ open source libraries for LLMart and MART. The LLMart toolkit provides a convenient pipeline “ImageTextToLoss” for computing gradients of the image input. This pipeline works for many image-text-to-text generation models in the Hugging Face transformers library, including the UI-TARS family. MART provides an interface to compose Lp-norm bounded perturbations in the image domain, which leads to changes that are imperceptible to the human eye. Combining LLMart with MART, it is straightforward to write a gradient descent loop to optimize the perturbation that forces UI-TARS to open the calculator app. The example code uses the smallest UI-TARS-2B model as the target, so the optimization workload fits into the memory of a single device. The same attack also works on larger models given enough VRAM.

On GitHub, LLMart is available through an Apache 2.0 license while MART is available through a BSD 3-Clause license.

Looking Forward

The vulnerability of computer-use AI agents to adversarial examples remains an open problem, and this same issue may also exist in proprietary products. Using the Intel Labs PoC, the community can perform independent validation on similar systems. Even if the fundamental problem of adversarial examples is hard to solve, a security guardrail such as isolation via Intel® SGX will limit what an AI agent can do, further reducing the attack surface.

References

ByteDance-Seed/UI-TARS-2B-SFT. Hugging Face. (n.d.). https://huggingface.co/ByteDance-Seed/UI-TARS-2B-SFT
Intel Labs. (n.d.). Intel Labs/LLMart: LLM Adversarial Robustness Toolkit, a toolkit for evaluating LLM robustness through adversarial testing. GitHub. https://github.com/IntelLabs/llmart
Intel Labs. (n.d.). Intel Labs/MART: Modular Adversarial Robustness Toolkit. GitHub. https://github.com/IntelLabs/MART
Strickland, E. (2025, February 13). Are You Ready to Let an AI Agent Use Your Computer? IEEE Spectrum. https://spectrum.ieee.org/ai-agents-computer-use
Intel Labs. (n.d.). LLMart/examples/vlm at main Intel Labs/LLMart. GitHub. https://github.com/IntelLabs/LLMart/tree/main/examples/vlm