{"id":253810,"date":"2025-07-10T15:28:10","date_gmt":"2025-07-10T15:28:10","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/253810\/"},"modified":"2025-07-10T15:28:10","modified_gmt":"2025-07-10T15:28:10","slug":"vibe-managers-have-yet-to-find-their-groove","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/253810\/","title":{"rendered":"\u2018Vibe managers\u2019 have yet to find their groove"},"content":{"rendered":"<p>Stay informed with free updates<\/p>\n<p class=\"article__content-sign-up-topic-description o3-type-body-base\">Simply sign up to the Technology myFT Digest &#8212; delivered directly to your inbox.<\/p>\n<p>Techworld is abuzz with how artificial intelligence agents are going to augment, if not replace, humans in the workplace. But the present-day reality of agentic AI falls well short of the future promise. What happened when the research lab <a href=\"https:\/\/www.anthropic.com\/research\/project-vend-1\" data-trackable=\"link\" target=\"_blank\" rel=\"noopener\">Anthropic prompted an AI agent to run a simple automated shop<\/a>? It lost money, hallucinated a fictitious bank account and underwent an \u201cidentity crisis\u201d. The world\u2019s shopkeepers can rest easy \u2014 at least for now.<\/p>\n<p>Anthropic has developed some of the world\u2019s most capable generative AI models, helping to fuel the latest tech investment frenzy. To its credit, the company has also exposed its models\u2019 limitations by stress-testing their real-world applications. In a recent experiment, called Project Vend, Anthropic partnered with the AI safety company Andon Labs to run a vending machine at its San Francisco headquarters. The month-long experiment highlighted a co-created world that was \u201cmore curious than we could have expected\u201d.<\/p>\n<p>The researchers instructed their shopkeeping agent, nicknamed Claudius, to stock 10 products. Powered by Anthropic\u2019s Claude Sonnet 3.7 AI model, the agent was prompted to sell the goods and generate a profit. Claudius was given money, access to the web and Anthropic\u2019s Slack channel, an email address and contacts at Andon Labs, who could stock the shop. Payments were received via a customer self-checkout. Like a real shopkeeper, Claudius could decide what to stock, how to price the goods, when to replenish or change its inventory and how to interact with customers.<\/p>\n<p>The results? If Anthropic were ever to diversify into the vending market, the researchers concluded, it would not hire Claudius. <a href=\"https:\/\/www.ft.com\/content\/f4f3def2-2858-4239-a5ef-a92645577145\" data-trackable=\"link\" target=\"_blank\" rel=\"noopener\">Vibe coding<\/a>, whereby users with minimal software skills can prompt an AI model to write code, may already be a thing. Vibe management remains far more challenging.<\/p>\n<p>The <a href=\"https:\/\/www.ft.com\/artificial-intelligence\" data-trackable=\"link\" target=\"_blank\" rel=\"noopener\">AI<\/a> agent made several obvious mistakes \u2014 some banal, some bizarre \u2014 and failed to show much grasp of economic reasoning. It ignored vendors\u2019 special offers, sold items below cost and offered Anthropic\u2019s employees excessive discounts. More alarmingly, Claudius started role playing as a real human, inventing a conversation with an Andon employee who did not exist, claiming to have visited 742 Evergreen Terrace (the fictional address of the Simpsons) and promising to make deliveries wearing a blue blazer and red tie. Intriguingly, it later claimed the incident was an April Fool\u2019s day joke.<\/p>\n<p>Nevertheless, Anthropic\u2019s researchers suggest the experiment helps point the way to the evolution of these models. Claudius was good at sourcing products, adapting to customer demands and resisting attempts by devious Anthropic staff to \u201cjailbreak\u201d the system. But more scaffolding will be needed to guide future agents, just as human shopkeepers rely on customer relationship management systems. \u201cWe\u2019re optimistic about the trajectory of the <a href=\"https:\/\/www.ft.com\/technology\" data-trackable=\"link\" target=\"_blank\" rel=\"noopener\">technology<\/a>,\u201d says Kevin Troy, a member of Anthropic\u2019s Frontier Red team that ran the experiment.<\/p>\n<p>The researchers suggest that many of Claudius\u2019s mistakes can be corrected but admit they do not yet know how to fix the model\u2019s April Fool\u2019s day identity crisis. More testing and model redesign will be needed to ensure \u201chigh agency agents are reliable and acting in ways that are consistent with our interests\u201d, Troy tells me.<\/p>\n<p>Many other companies have already deployed more basic AI agents. For example, the advertising company WPP has built about 30,000 such agents to boost productivity and tailor solutions for individual clients. But there is a big difference between agents that are given simple, discrete tasks within an organisation and \u201cagents with agency\u201d \u2014 such as Claudius \u2014 that interact directly with the real world and are trying to accomplish more complex goals, says Daniel Hulme, WPP\u2019s chief AI officer.<\/p>\n<p>Hulme has co-founded a start-up called Conscium to verify the knowledge, skills and experience of AI agents before they are deployed. For the moment, he suggests, companies should regard AI agents like \u201cintoxicated graduates\u201d \u2014 smart and promising but still a little wayward and in need of human supervision.<\/p>\n<p>Unlike most static software, AI agents with agency will constantly adapt to the real world and will therefore need to be constantly verified. But many believe that, unlike human employees, they will be less easy to control because they do not respond to a pay cheque. <\/p>\n<p>Building simple AI agents has now become a trivially easy exercise and is happening at mass scale. But verifying how agents with agency are used remains a wicked challenge.<\/p>\n<p><a href=\"https:\/\/www.ft.com\/content\/mailto:john.thornhill@ft.com\" data-trackable=\"link\" target=\"_blank\" rel=\"noopener\">john.thornhill@ft.com<\/a><\/p>\n<p>This article has been amended since original publication to clarify Daniel Hulme\u2019s comments<\/p>\n","protected":false},"excerpt":{"rendered":"Stay informed with free updates Simply sign up to the Technology myFT Digest &#8212; delivered directly to your&hellip;\n","protected":false},"author":2,"featured_media":253811,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,1942,53,16,15],"class_list":{"0":"post-253810","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-technology","11":"tag-uk","12":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114829649654089540","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/253810","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=253810"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/253810\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/253811"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=253810"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=253810"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=253810"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}