To Secure AI, Start Thinking Like an Attacker

By Peter Garraghan, Mindgard

The rapid adoption of AI has unleashed a flood of innovation, but it has also exposed a glaring cybersecurity bottleneck of the industry’s own making. Organizations are racing to implement generative models, autonomous agents, and AI-enhanced services, yet too often without any real assurance that these systems are secure. According to a recent World Economic Forum report, 66% of businesses expect AI to impact cybersecurity profoundly in the coming year, but only 37% currently assess the security of AI tools before deployment. This is a dangerous paradox: recognition without readiness.

We’ve seen this before. During the rise of cloud computing, security was often an afterthought, until the breaches began. With AI, however, the risks are not only faster and more complex, but also fundamentally different. Treating AI like any other software is a category error. These systems are non-deterministic, probabilistic, and deeply entangled in application workflows. They require a new playbook.

Your AI Can (and Likely Will) Be Hacked

Without sounding too apocalyptic, AI at its core still involves risks related to software, hardware, and data. These are not unfamiliar to security practitioners, who already have controls and processes in place. However, the emergence of a new technological paradigm requires adapting existing tools, training, and playbooks — and AI is no exception.

AI models today expose entirely novel attack surfaces. Prompt injection, jailbreaks, adversarial chaining, and model extraction are not speculative threats. They are active techniques already being used in the wild. Consider “Ghost in the Shell” scenarios, where sensitive data can be resurrected from a model’s memory long after the original dataset has been deleted. Or data poisoning campaigns, such as the manipulation of open-source training, designed to alter downstream model behavior and spread disinformation.

These attacks can take just minutes to execute and often require minimal technical skill. In fact, journalists with no specialist hacking background demonstrated that OpenAI’s ChatGPT search tool was vulnerable to hidden-text prompt injection. By embedding invisible instructions into web pages, attackers were able to manipulate ChatGPT into generating misleading outputs, such as artificially positive product reviews, and even returning malicious code. If that is what non-experts can achieve, imagine what skilled threat actors are capable of.

OWASP Is Just the Start

AI security is a rapidly moving target. As models become more capable and embedded in critical workflows, new classes of vulnerabilities will inevitably emerge. Some are due to advances in model architecture, others to complex human-AI interaction patterns. Attackers are already probing areas like model extraction, watermark removal, prompt injection chaining, and indirect prompt exploitation through third-party tools.

Frameworks like OWASP and MITRE help categorize risks, but they are not executional. A single line item — say, “LLM02:2025 Sensitive Information Disclosure” — masks dozens of specific attack vectors. Moreover, attack categories continue to expand beyond the model itself:

Systemic integration risks occur when AI models interact with plugins, APIs, or orchestration layers like RAG pipelines. These interfaces often become entry points for serialization attacks, privilege escalation, or command injection. These risks are frequently missed by conventional security tools.
Runtime-only threats emerge under live input conditions. These include context overflow, logic corruption, and behavior drift, which only manifest under operational stress or dynamic user interaction. They are not detectable through static testing.
Data exposure and memory-based attacks exploit models that retain conversational context or ingest user data during inference. These can result in sensitive information leaking through outputs, logs, or the misuse of fine-tuned models.

Just as traditional software security evolved from buffer overflows to sophisticated supply chain attacks, AI security is now on a similar trajectory. The space is early, and the discovery curve is steep.

Executive-Level Concerns of AI Adoption

These technical vulnerabilities, if left untested, do not exist in isolation. They manifest as broader organizational risks that extend well beyond the engineering domain. When viewed through the lens of operational impact, the consequences of insufficient AI security testing map directly to failures in safety, security, and business assurance. Categorizing the risk landscape this way helps translate technical threats into executive-level priorities.

Safety risks involve failures in model behavior that lead to harmful or unintended outcomes. This includes misaligned outputs, instruction-following errors, and toxic or biased responses. Without adversarial prompt testing and stress testing under edge conditions, models may behave unpredictably or damage reputations, especially in regulated or customer-facing environments.
Security risks stem from adversarial exploitation of the model or its surrounding system. This includes prompt injection, jailbreaks, remote code execution via plugin interfaces, and data leakage through model outputs or persistent context. These vulnerabilities often escape traditional scanning tools and are enabled by tokenization quirks, malformed inputs, or untrusted integrations.
Business risks arise when AI tools fail to meet operational or compliance standards. This includes regulatory violations from unauthorized data processing, system outages due to untested model behavior at scale, and hidden costs from cascading system failures. These risks increase when AI tools are deployed in decision-making workflows without formal assurance processes.

These are no longer theoretical. In regulated sectors such as finance, healthcare, and critical infrastructure, such failures could be catastrophic.

Toward Adaptive, AI-Native Security

Just as DevSecOps transformed how we deliver software, we now need adversarial AIOps: AI security embedded from model development to production runtime. Automated AI-specific security testing must become a standard part of CI/CD pipelines, not a niche capability.

Ultimately, the security bottleneck in AI isn’t due to a lack of awareness. It’s due to inertia. We’re applying yesterday’s tools to tomorrow’s risks. And unless we fix that, the promise of AI will remain shackled by vulnerabilities we chose not to confront.

About the author:

Dr. Peter Garraghan is CEO & co-founder at Mindgard, the leader in artificial intelligence security testing. Founded at Lancaster University and backed by cutting-edge research, Mindgard enables organizations to secure their AI systems from new threats that traditional application security tools cannot address. As a professor of computer science at Lancaster University, Peter is an internationally recognized expert in AI security. He has devoted his career to developing advanced technologies to combat the growing threats facing AI. With over €11.6 million in research funding and more than 60 published scientific papers, his contributions span both scientific innovation and practical solutions.

To Secure AI, Start Thinking Like an Attacker

Tags: