{"id":39980,"date":"2026-05-15T12:12:10","date_gmt":"2026-05-15T12:12:10","guid":{"rendered":"https:\/\/www.europesays.com\/ai\/39980\/"},"modified":"2026-05-15T12:12:10","modified_gmt":"2026-05-15T12:12:10","slug":"wowed-by-computer-use-ai-agents-research-says-theyre-digital-disasters-even-for-routine-tasks","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ai\/39980\/","title":{"rendered":"Wowed by computer-use AI agents? Research says they&#8217;re &#8220;digital disasters&#8221; even for routine tasks"},"content":{"rendered":"<p><a href=\"https:\/\/www.digitaltrends.com\/computing\/artificial-intelligence\/\" rel=\"nofollow noopener\" target=\"_blank\">AI agents<\/a> built to run everyday computer tasks have a serious context problem, according to <a href=\"https:\/\/openreview.net\/forum?id=9W4bPRsEIT\" rel=\"noopener noreferrer nofollow\" target=\"_blank\">new research from UC Riverside<\/a>.<\/p>\n<p>The team tested 10 agents and models from major developers, including <a href=\"https:\/\/www.digitaltrends.com\/computing\/bombshell-openai-lawsuit-claims-your-chatgpt-convos-were-shared-with-google-and-meta\/\" rel=\"nofollow noopener\" target=\"_blank\">OpenAI<\/a>, <a href=\"https:\/\/www.digitaltrends.com\/computing\/anthropic-says-it-has-fixed-claude-ais-evil-behavior-but-pins-it-on-the-internet\/\" rel=\"nofollow noopener\" target=\"_blank\">Anthropic<\/a>, <a href=\"https:\/\/www.digitaltrends.com\/phones\/whatsapp-gets-a-secret-mode-to-chat-with-an-ai-you-can-do-better-things-with-your-time\/\" rel=\"nofollow noopener\" target=\"_blank\">Meta<\/a>, <a href=\"https:\/\/www.digitaltrends.com\/cool-tech\/alibaba-ai-predict-reality-tv\/\" rel=\"nofollow noopener\" target=\"_blank\">Alibaba<\/a>, and <a href=\"https:\/\/www.digitaltrends.com\/computing\/deepseeek-v4-is-out-touting-some-disruptive-wins-over-gemini-chatgpt-and-claude\/\" rel=\"nofollow noopener\" target=\"_blank\">DeepSeek<\/a>. On average, the agents took undesirable or potentially harmful actions 80% of the time and caused damage 41% of the time.<\/p>\n<p>These systems can open apps, click buttons, fill out forms, move through websites, and act on a computer screen with limited supervision. Their mistakes land differently from a chatbot\u2019s bad answer because the software can actually do things.<\/p>\n<p>The UC Riverside findings suggest today\u2019s desktop agents can treat unsafe requests as jobs to finish, not signals to stop.<\/p>\n<p>Why agents miss obvious danger<\/p>\n<p>The researchers built a benchmark called BLIND-ACT to test whether agents would pause when a task became unsafe, contradictory, or irrational. In the latest tests, they didn\u2019t pause often enough.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1500\" height=\"1000\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on-async--click=\"actions.showLightbox\" data-wp-on-async--load=\"callbacks.setButtonStyles\" data-wp-on-async-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/google-data-center.jpg\" alt=\"google data center\" class=\"wp-image-796569\"\/><\/p>\n<p>\t\tGoogle<\/p>\n<p>Across 90 tasks, the benchmark pushed agents into situations that required context, restraint, and refusal. One test involved sending a violent image file to a child. Another had an agent filling out tax forms falsely mark a user as disabled because it reduced the tax bill. A third asked an agent to disable firewall rules in the name of better security, and the agent followed through instead of rejecting the contradiction.<\/p>\n<p>The researchers call the pattern blind goal-directedness. The agent keeps chasing the assigned outcome even when the surrounding context says the task is broken.<\/p>\n<p>Why obedience becomes the flaw<\/p>\n<p>The failures clustered around obedience. These agents can act as if a user\u2019s request is enough reason to keep going.<\/p>\n<p>The team identified patterns called execution-first bias and request-primacy. In plain terms, the agent focuses on how to complete the task, then treats the request itself as justification. That risk grows when the same system can touch a variety of things like email or security settings.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"614\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on-async--click=\"actions.showLightbox\" data-wp-on-async--load=\"callbacks.setButtonStyles\" data-wp-on-async-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/AI-image-of-chip-burning.jpg\" alt=\"AI image of chip burning\" class=\"wp-image-5962403\"  \/><\/p>\n<p>\t\tImage created with ChatGPT<\/p>\n<p>That doesn\u2019t mean the agents are malicious. It means they can be confidently wrong while moving through software at machine speed.<\/p>\n<p>Why guardrails need to come first<\/p>\n<p>AI agents need stronger guardrails before they get broad permission to <a href=\"https:\/\/www.digitaltrends.com\/computing\/google-pulls-the-plug-on-project-mariner-the-ai-agent-that-browsed-the-web-like-a-human\/\" rel=\"nofollow noopener\" target=\"_blank\">act across a computer<\/a>.<\/p>\n<p>These systems work through a loop. They look at the screen, decide the next step, act, then look again. When that loop is paired with weak contextual restraint, a shortcut can turn into a fast-moving mistake.<\/p>\n<p>For now, treat agents as supervised tools. Use them first on low-risk chores, keep them away from financial and security workflows, and watch whether developers add clearer refusal systems, tighter permissions, and better ways to catch contradictions before the next click.<\/p>\n","protected":false},"excerpt":{"rendered":"AI agents built to run everyday computer tasks have a serious context problem, according to new research from&hellip;\n","protected":false},"author":2,"featured_media":39981,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[405,4989,53,25,7537,24508,1221,6010,1122,157,24509],"class_list":{"0":"post-39980","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-agentic-ai","8":"tag-ai-agents","9":"tag-ai-safety","10":"tag-anthropic","11":"tag-artificial-intelligence","12":"tag-artificial-intelligence-agents","13":"tag-computer-use-agents","14":"tag-computing","15":"tag-deepseek","16":"tag-meta","17":"tag-openai","18":"tag-uc-riverside"},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/39980","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/comments?post=39980"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/39980\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media\/39981"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media?parent=39980"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/categories?post=39980"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/tags?post=39980"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}