{"id":28125,"date":"2026-05-05T15:23:07","date_gmt":"2026-05-05T15:23:07","guid":{"rendered":"https:\/\/www.europesays.com\/ai\/28125\/"},"modified":"2026-05-05T15:23:07","modified_gmt":"2026-05-05T15:23:07","slug":"advancing-ai-evaluation-with-the-center-for-ai-standards-us-and-innovation-and-the-ai-security-institute-uk","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ai\/28125\/","title":{"rendered":"Advancing AI evaluation with the Center for AI Standards (US) and Innovation and the AI Security Institute (UK)"},"content":{"rendered":"<p>Today, Microsoft is announcing new agreements with the\u00a0<a href=\"https:\/\/www.nist.gov\/caisi\/\" rel=\"nofollow noopener\" target=\"_blank\">Center\u00a0for AI Standards and Innovation<\/a>\u00a0(CAISI)\u00a0in the US\u00a0and the\u00a0<a href=\"https:\/\/www.aisi.gov.uk\/\" rel=\"nofollow noopener\" target=\"_blank\">AI Security Institute<\/a>\u00a0(AISI)\u00a0in the UK\u00a0to advance the science of AI testing and evaluation, including through collaborative work to test Microsoft\u2019s frontier models, assess safeguards, and help mitigate national security and large-scale public safety risks.\u00a0These agreements matter because ongoing,\u00a0rigorous testing is\u00a0essential\u00a0to building trust and confidence in advanced AI systems. Well-constructed tests help us understand whether\u00a0our\u00a0systems are working as intended and delivering\u00a0the\u00a0benefits\u00a0they are\u00a0designed\u00a0to provide. Testing also helps us stay ahead of\u00a0risks, such as\u00a0AI-driven cyberattacks\u00a0and\u00a0other\u00a0criminal\u00a0misuses\u00a0of AI systems, that can\u00a0emerge\u00a0once\u00a0advanced AI\u00a0systems are deployed in the world.\u00a0<\/p>\n<p>While Microsoft regularly undertakes many types of AI testing on its own,\u00a0testing\u00a0for\u00a0national security and large-scale public safety\u00a0risks\u00a0necessarily\u00a0must\u00a0be\u00a0a collaborative endeavor with governments. This type of testing depends on deep technical, scientific, and national security\u00a0expertise\u00a0that is uniquely held by institutions like\u00a0CAISI\u00a0in the US\u00a0and\u00a0AISI\u00a0in the UK\u00a0and the\u00a0government\u00a0agencies they work with. By combining that government\u00a0expertise\u00a0with Microsoft\u2019s experience building and deploying AI\u00a0systems\u00a0at\u00a0global scale,\u00a0together\u00a0we\u00a0are better positioned to\u00a0anticipate\u00a0and manage national security and public safety risks in ways that build public trust and confidence in advanced AI systems.\u00a0\u00a0<\/p>\n<p>Improving\u00a0AI\u00a0evaluation science\u00a0through\u00a0cooperative\u00a0research\u00a0and operational experience\u00a0<\/p>\n<p>Advancing the science of AI\u00a0evaluation\u00a0requires more than isolated research or one-off\u00a0testing. It depends on\u00a0sustained\u00a0collaboration between industry, government, and research institutions.\u00a0Through\u00a0our new and expanded partnerships\u00a0with the US\u00a0and UK governments\u2014alongside\u00a0national\u00a0security\u2013focused evaluations\u00a0of model capabilities\u2014Microsoft is\u00a0bringing\u00a0technical\u00a0expertise\u00a0and operational experience to strengthen\u00a0AI evaluation\u00a0methods and practical\u00a0testing\u00a0foundations.\u00a0\u00a0<\/p>\n<p>In the US, with\u00a0CAISI, Microsoft and NIST will collaborate on improving methodologies for adversarial assessments\u2014testing AI systems in ways that probe unexpected behaviors, misuse pathways, and failure modes, much like stress-testing\u00a0whether airbags, seatbelts, and braking systems work\u00a0effectively and reliably\u00a0in\u00a0safety-critical\u00a0driving\u00a0scenarios. This work\u00a0involves\u00a0co-developing more systematic and reproducible approaches to evaluation, including shared frameworks, datasets, and workflows for assessing safety, security, and robustness risks in advanced AI systems.\u00a0It also builds on our AI Red Team\u2019s novel research and tools\u00a0<a href=\"https:\/\/www.microsoft.com\/en-us\/security\/blog\/2026\/02\/04\/detecting-backdoored-language-models-at-scale\/?utm_source=chatgpt.com\" rel=\"nofollow noopener\" target=\"_blank\">to detect compromised models at scale<\/a>.\u00a0<\/p>\n<p>In the UK,\u00a0with\u00a0AISI, Microsoft will collaborate on research related to frontier safety and security,\u00a0including methods for evaluating high-risk capabilities and the effectiveness of the safeguards used to address them.\u00a0The partnership\u00a0will also include\u00a0societal resilience research examining how conversational AI systems interact with users in sensitive contexts.\u00a0\u00a0<\/p>\n<p>These collaborations are designed to improve measurement science, evaluation methodologies, practical testing workflows, and real-world mitigation impact. They reflect a shared commitment to rigorous, practical approaches that can make safeguards stronger and evaluations more\u00a0reliable.\u00a0<\/p>\n<p>Looking ahead\u00a0<\/p>\n<p>No\u00a0organization can address these challenges alone. Our partnerships with\u00a0CAISI\u00a0and\u00a0AISI\u00a0are\u00a0a key\u00a0part of a wider effort to build the institutions, research base, and shared methodologies needed for effective AI testing. This\u00a0effort\u00a0also includes:\u00a0<\/p>\n<p>Pursuing\u00a0research and evaluation\u00a0in collaboration with\u00a0other\u00a0AI institutes\u00a0globally\u00a0while helping advance shared priorities and methodologies for testing through the\u00a0International Network for AI Measurement, Evaluation and Science.\u00a0<\/p>\n<p>Helping\u00a0deliver industry best practices through\u00a0the\u00a0<a href=\"https:\/\/www.frontiermodelforum.org\/\" rel=\"nofollow noopener\" target=\"_blank\">Frontier Model Forum<\/a>\u00a0(FMF),\u00a0an\u00a0initiative dedicated to advancing the science and practice of frontier AI safety\u00a0and security. Through the FMF, we are working with other leading AI developers to support independent research, develop shared evaluation methodologies, and promote transparency around risk mitigation strategies.\u00a0\u00a0<\/p>\n<p>Contributing to\u00a0<a href=\"https:\/\/mlcommons.org\/\" rel=\"nofollow noopener\" target=\"_blank\">MLCommons<\/a>,\u00a0a multistakeholder non-profit\u00a0that\u00a0develops\u00a0and operationalizes\u00a0testing tools\u00a0such as\u00a0<a href=\"https:\/\/mlcommons.org\/ailuminate\/\" rel=\"nofollow noopener\" target=\"_blank\">AILuminate<\/a>, a family of safety and security benchmarks.\u00a0In\u00a0February, we\u00a0<a href=\"https:\/\/blogs.microsoft.com\/on-the-issues\/2026\/02\/17\/acting-with-urgency-to-address-the-growing-ai-divide\/\" rel=\"nofollow noopener\" target=\"_blank\">announced<\/a>\u00a0efforts underway with institutions in India, Japan, Korea, and Singapore to\u00a0<a href=\"https:\/\/mlcommons.org\/ailuminate\/ailuminate-multimodal\/\" rel=\"nofollow noopener\" target=\"_blank\">expand AILuminate<\/a>\u00a0to\u00a0support multilingual, multicultural,\u00a0and multimodal evaluation, helping to make sure that AI systems work well in the languages and cultural contexts in which people\u00a0around the world\u00a0use them.\u00a0<\/p>\n<p>As AI capabilities advance, so too must the rigor of the testing and safeguards that underpin them. We\u00a0will\u00a0apply\u00a0what we learn from these partnerships directly into how we design, test,\u00a0and deploy AI systems, ensuring that progress in evaluation science translates into safer, more secure products for our customers.\u00a0As\u00a0these partnerships progress, we\u00a0will\u00a0share\u00a0what we learn and\u00a0look for opportunities\u00a0to apply\u00a0insights and best practices\u00a0to\u00a0AI testing\u00a0more broadly.\u00a0\u00a0\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"Today, Microsoft is announcing new agreements with the\u00a0Center\u00a0for AI Standards and Innovation\u00a0(CAISI)\u00a0in the US\u00a0and the\u00a0AI Security Institute\u00a0(AISI)\u00a0in the&hellip;\n","protected":false},"author":2,"featured_media":28126,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11],"tags":[420,7853,416,320,7852],"class_list":{"0":"post-28125","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-microsoft","8":"tag-azure","9":"tag-azure-copilot","10":"tag-copilot","11":"tag-microsoft","12":"tag-microsoft-copilot"},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/28125","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/comments?post=28125"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/28125\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media\/28126"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media?parent=28125"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/categories?post=28125"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/tags?post=28125"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}