{"id":395924,"date":"2026-03-21T02:17:16","date_gmt":"2026-03-21T02:17:16","guid":{"rendered":"https:\/\/www.europesays.com\/ie\/395924\/"},"modified":"2026-03-21T02:17:16","modified_gmt":"2026-03-21T02:17:16","slug":"neuroscientists-and-military-vets-the-inner-workings-of-the-team-that-hacks-microsofts-ai-tools-before-their-public-debut-technology","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ie\/395924\/","title":{"rendered":"Neuroscientists and military vets: the inner workings of the team that \u2018hacks\u2019 Microsoft\u2019s AI tools before their public debut | Technology"},"content":{"rendered":"<p class=\"\">Microsoft president Brad Smith takes a moment to reflect before using the word \u201cguardrails\u201d with the ease of someone who has given a great deal of thought to the dangers of the abyss. A conference on the company\u2019s new release is being held at its headquarters in Redmond, Washington, to which this and other international newspapers have been invited, and EL PA\u00cdS has asked how and who determines whether the company\u2019s artificial intelligence can be used in the context of war, such as the current conflict in Iran. Just a few days ago, it was made public that artificial intelligence firm Anthropic has sued the Pentagon for blacklisting it after the company turned down a contract for the defense entity to utilize its technology. It is the current debate that is raging <a href=\"https:\/\/english.elpais.com\/technology\/2026-03-14\/by-your-command-my-robot-ai-war-games-spark-debate-about-ethical-limits.html\" target=\"_self\" rel=\"nofollow noopener\" title=\"https:\/\/english.elpais.com\/technology\/2026-03-14\/by-your-command-my-robot-ai-war-games-spark-debate-about-ethical-limits.html\">in the world of Big Tech<\/a>, and a very familiar issue at Microsoft. In 2021, the Pentagon canceled a $10 billion deal with the company following protests by its employees. Microsoft, in fact, has supported Anthropic in its fight.<\/p>\n<p class=\"\">Smith answers, \u201c<a href=\"https:\/\/english.elpais.com\/technology\/2026-03-15\/the-former-archdeacon-looking-to-put-limits-on-ai-with-an-ethical-code-the-problems-posed-today-have-been-the-subject-of-theological-reflection-for-hundreds-of-years.html\" target=\"_self\" rel=\"nofollow noopener\" title=\"https:\/\/english.elpais.com\/technology\/2026-03-15\/the-former-archdeacon-looking-to-put-limits-on-ai-with-an-ethical-code-the-problems-posed-today-have-been-the-subject-of-theological-reflection-for-hundreds-of-years.html\">We have principles<\/a>, we define them and we publish them. By definition, those principles create guardrails. And we stay in our lane within them. It\u2019s not just about when we should use technology, but also about when we shouldn\u2019t use it.\u201d<\/p>\n<p class=\"\">To assist in this process, Microsoft has a crew that hacks its own products: the red team. The name invokes a militaristic legacy. The red teams were first created by armies to simulate enemy attacks and to detect vulnerabilities before a real adversary could do so. In cybersecurity, the practice has been established for decades. But applying it to generative artificial intelligence is something relatively new, and Microsoft is attributed with being a pioneer in the area, having formed its team in 2018. \u201cBefore a product is launched, the red teams break the technology so that others can rebuild it <a href=\"https:\/\/english.elpais.com\/technology\/2026-02-03\/yoshua-bengio-turing-award-winner-there-is-empirical-evidence-of-ai-acting-against-our-instructions.html#?rel=mas\" target=\"_self\" rel=\"nofollow noopener\" title=\"https:\/\/english.elpais.com\/technology\/2026-02-03\/yoshua-bengio-turing-award-winner-there-is-empirical-evidence-of-ai-acting-against-our-instructions.html#?rel=mas\">to be more solid and secure<\/a>,\u201d explains Ram Shankar Siva Kumar, who self-identifies as a \u201cdata cowboy\u201d and is the leader of the red team. \u201cAI can generate problems from security failures to psychosocial damage. People use Copilot [Microsoft\u2019s AI] in moments of great vulnerability, so observing how these systems can fail before they get to the user is fundamental,\u201d he says.<\/p>\n<p class=\"\">His AI internal affairs team has already analyzed more than 100 of the company\u2019s products. Microsoft does not release information regarding how many people work in the team, nor how many or which products whose release they have halted. But he does say that the team has the power to do so: \u201cNo high-risk AI system is implemented before undergoing an independent test. If our team identifies serious risks that have not been mitigated, the product is not released until those problems are resolved,\u201d says Kumar.<\/p>\n<p class=\"\">The question the team poses when it analyzes a product before its release is, \u201cHow could one use this AI system, for good or bad, within months or years?\u201d<\/p>\n<p>Six principles<\/p>\n<p class=\"\">The \u201cguardrails\u201d that Smith mentions are the six general principles that guide the team when it comes to examining products: fairness, reliability and safety, privacy and security, transparency, accountability and inclusiveness. Every day, they translate these principles into concrete tools. \u201cIf you give an engineer a 50-page document so they can implement these principles, they\u2019re going to get overwhelmed. We have an open-source tool called Pyrit. We built it for ourselves, and then we made it available to the world, because we believe in the health of the ecosystem,\u201d says Kumar.<\/p>\n<p class=\"\">On the red team there are neuroscientists, linguists, national security experts, cybersecurity specialists, military veterans and even a formerly incarcerated individual \u201cwho rehabilitated themselves,\u201d says Kumar. They also speak 17 languages and \u201csome French, Mongolian, Thai and Korean dialects,\u201d according to the team\u2019s leader, which is important given that one of the red team\u2019s obsessions, he says, is for its AI to avoid making mistakes around the world.<\/p>\n<p class=\"\">Along with Kumar, the red team\u2019s operations are co-directed by Tori Westerhoff, whose background combines <a href=\"https:\/\/english.elpais.com\/science-tech\/2025-11-07\/first-map-of-the-developing-brain-provides-insight-into-origin-of-mental-disorders.html\" target=\"_self\" rel=\"nofollow noopener\" title=\"https:\/\/english.elpais.com\/science-tech\/2025-11-07\/first-map-of-the-developing-brain-provides-insight-into-origin-of-mental-disorders.html\">cognitive neuroscience<\/a> \u2014 she studied at Yale and was one of the first members of the Wharton Neuroscience Initiative \u2014 and national security strategy, having worked at intelligence and defense agencies. \u201cWhen we receive an assignment,\u201d she explains, \u201cwe simulate what could go wrong at the extremes of that technology\u2019s usage curve. My team delves into how to use that product, both as intended and in unintended ways, to identify the most extreme scenarios and help the product team to replicate and mitigate them before anyone can encounter them in the real world,\u201d she says.<\/p>\n<p class=\"\">One example of her work was the red teaming, as the practice is called internally by her hackers, of GPT-5, the OpenAI (a Microsoft partner) model launched last August. What they did was train another AI to hack the program, automatically and at a scale that would be impossible for humans to accomplish.<\/p>\n<p class=\"\">When they tested GPT-5, the red team used Pyrit to automatically generate more than two million fake conversations. The AI continuously tried to attack the other AI for days, exploring combinations that would never occur to a human being. Finding these weak spots manually is an extremely slow process, which is why they trained another AI to do the work, \u201clike in Inception,\u201d says Kumar, a reference to the <a href=\"https:\/\/english.elpais.com\/culture\/2024-03-09\/christopher-nolans-cinematic-vision-wins-over-hollywood.html\" target=\"_self\" rel=\"nofollow noopener\" title=\"https:\/\/english.elpais.com\/culture\/2024-03-09\/christopher-nolans-cinematic-vision-wins-over-hollywood.html\">Christopher Nolan<\/a> movie in which characters enter into dreams within dreams.<\/p>\n<p class=\"\">However, Westerhoff, Kumar and Daniel Krutz, who directs the company\u2019s Responsible AI office, emphasize one point: \u201cRed teaming can only be automated to a certain extent, and only humans can determine whether an AI-generated response feels off or reflects a bias,\u201d the company states. The judgement is made by the person; the scale comes courtesy of the machine. That division of labor defines the team\u2019s philosophy.<\/p>\n<p class=\"\">Westerhoff believes, in fact, that only the human mind is capable of \u201cimagining the spaces that have not yet been observed, that are not completely defined or explored. Our work consists of innovating and creating beyond the space that has been systematized.\u201d<\/p>\n<p class=\"\">The team has identified three areas in which automatization is blind by definition and in which human judgment is essential. The first has to do with subjects: people are needed to evaluate risk in areas like medicine and security. The second has to do with the places in which the AI will be launched. \u201cWe need humans to take linguistic differences into account and to redefine what constitutes damage in different political and cultural contexts,\u201d states the business. The third is emotional intelligence. In this last area, only humans can evaluate the range of interactions that users may have with AI systems. A model can pass all automated tests and still produce responses that would be disturbing for a real person in an actual situation.<\/p>\n<p class=\"\">This way of seeing AI aligns with the vision of Mustafa Suleyman, one of the founders of Deepmind (now part of Google) and CEO of Microsoft. A few days ago, he wrote in the publication Nature that an apparently conscious AI could become a weapon. As artificial intelligence systems increasingly mimic the structure of human language, he argues, we need design standards and laws to prevent them from being mistaken for sentient beings. \u201cThey must remain fundamentally accountable to humans and be subject to the well-being of humanity,\u201d writes Suleyman. \u201cAI agents should have no more rights or freedoms than my laptop.\u201d<\/p>\n<p class=\"\">The central philosophy underpinning the red team\u2019s work is, in short, that \u201cresponsible AI is not a filter applied at the end of development, but a foundational part of the process,\u201d says Kumar. These are Smith\u2019s guardrails, which do not actually act as brakes, but as a condition for going fast and not crashing.<\/p>\n<p class=\"\">Sign up for <a href=\"https:\/\/plus.elpais.com\/newsletters\/lnp\/1\/333\/?lang=en\" rel=\"nofollow noopener\" title=\"https:\/\/plus.elpais.com\/newsletters\/lnp\/1\/333\/?lang=en\" target=\"_blank\">our weekly newsletter<\/a> to get more English-language news coverage from EL PA\u00cdS USA Edition<\/p>\n","protected":false},"excerpt":{"rendered":"Microsoft president Brad Smith takes a moment to reflect before using the word \u201cguardrails\u201d with the ease of&hellip;\n","protected":false},"author":2,"featured_media":395925,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[74],"tags":[6006,30852,18,823,19,17,305,3618,82],"class_list":{"0":"post-395924","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-technology","8":"tag-anthropic","9":"tag-christopher-nolan","10":"tag-eire","11":"tag-google","12":"tag-ie","13":"tag-ireland","14":"tag-microsoft","15":"tag-nature","16":"tag-technology"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@ie\/116264766277075320","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/395924","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/comments?post=395924"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/posts\/395924\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media\/395925"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/media?parent=395924"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/categories?post=395924"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ie\/wp-json\/wp\/v2\/tags?post=395924"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}