AI data protection

One of the most fascinating aspects of AI’s journey from experimentation to production deployment has been the limitations it exposes when it makes its way out of the sandbox. Maybe it’s a proof-of-concept spun up in the cloud that’s successful until it hits a barrier to scale. Or the carefully mapped infrastructure budget that goes out the window three years before the next refresh. Or, maybe it’s a sandbox environment with security approaches that work fine in the lab but break down when the AI is scaled across an enterprise data estate.

If that last one sounds alarming, it should.

In the controlled environment of data science labs, success is relative — but security can be, too. Sandboxes often bear little resemblance to the complex, distributed reality of enterprise AI, where they ultimately deliver on their promise. The security measures that appear manageable with gigabytes of test data create vulnerabilities (and compliance and governance headaches) when terabytes of customer information and sensitive data fill the pipeline.

The security gap in AI data architectures

IBM’s 2025 Cost of a Data Breach Report revealed that 13% of organizations experienced breaches of AI models or applications, with 97% of those compromised lacking proper AI access controls.

Bolting on multi-vendor solutions that treat storage as a “passive” component leaves organizations vulnerable. They can miss out on crucial threat signals hidden in their data. And, ransomware targeting AI data lakes and model repositories will thrive on fragmentation between storage platforms and cybersecurity tools.

When attackers target your AI training datasets and model checkpoints — the intellectual property that represents months or years of investment — traditional backup strategies won’t be enough.

Once organizations start depending on AI-driven decisioning, automation, or analytics, any outage or data corruption can be as damaging as a breach itself. Resilience becomes as critical as security. AI resilience means ensuring the continuous availability and recoverability of models, data sets, and pipelines so that business operations aren’t disrupted when (not if) something fails. True protection combines security, integrity, and rapid recovery — because an AI system that’s “secure” but offline still costs the business dearly.

The AI data governance factor

A recent study found that major AI governance frameworks (e.g., NIST, ALTAI, UK’s toolkit) leave large portions of risk unaddressed, with compliance-security gaps as high as 80%.

Here’s where AI adoption collides with a wave of new compliance requirements. From the EU’s AI Act to rapidly tightening cross-border data regulations in the U.S. and Asia, enterprises face a landscape where governance is no longer optional. Enterprises now face blind spots not just in how AI models behave, but also in how data flows across borders, how sensitive information is logged, and how regulatory audits are conducted. In financial services, regulators require that every transaction be logged with context to understand how AI behaved and what may have happened in the event of an issue.

Compliance obligations also go beyond “checkbox” security. Regulators increasingly demand hard evidence that data is encrypted, that immutable logs exist, that recovery procedures are tested, and that governance extends across the entire AI pipeline, from training datasets to deployed inference workloads.

The cloud: Where AI problems generally start

Here’s the heart of the AI problem: Most AI initiatives start in the cloud, where there are generally very few controls put in place, almost intentionally, to accelerate model development. 

That’s fine for a while, but eventually, companies want to start integrating their private, confidential, and valuable data with these models (via RAG). That’s where the problems arise: most companies, especially large, regulated ones, aren’t going to trust putting that data into an untrusted model in a public space, which means they’ll have to bring the models back on-premises to a trusted, controlled environment to do that integration work. 

This cloud-to-on-premises migration presents various problems because now the company has to find a way to migrate the model and the data, which can take a lot of time, resulting in very expensive resources (i.e., data scientists) being fully occupied and dramatically slowing the development of the service.

Is Zero Trust the answer?

While ZTA isn’t a catchall, it’s a vast improvement on traditional network security techniques. Its foundational principle is this: Assuming trust anywhere in security is a flawed approach. Implicit trust should never be granted to a user, device, or application based only on that user, device, or application’s location on a secure network. Every user and system must authenticate and validate their identity before accessing network resources.

That principle of least privilege has always been a cornerstone of ZTA: Give users only the access they need, nothing more. But the rise of AI has changed things. And remember that zero trust isn’t just about policies and guardrails — it’s about hygiene. Keeping IAM directories free of stale accounts, patching systems to eliminate known exploits, and conducting regular application security reviews are the day-to-day practices that make ZTA workable at scale. Without this hygiene, even the strongest guardrails can fail.

AI copilots and autonomous agents now interact with corporate systems, often with broad privileges. If overprovisioned, they can exfiltrate data or trigger changes at machine speed. These non-human identities (NHIs) include APIs, service accounts, microservices, IoT devices, and increasingly, AI agents. They outnumber human accounts by more than 80:1, and many use hard-coded credentials, shared keys, or static tokens — exactly the kinds of “back doors” ZTA was meant to eliminate.

To address this, companies must apply the same rigor to machines as they do to humans by using credential vaulting, frequent credential rotation, attribute-based access policies, and behavioral monitoring. The future is dynamic guardrails — AI-enforced least privilege. Instead of static rules, policies adapt continuously based on user, device, and contextual risk signals.

Pure Storage cyber-aware, governance-first AI infrastructure

Our approach makes the storage layer an active participant in AI data governance, compliance, threat detection, and response. Pure Storage directly addresses the “most AI issues start in the cloud” scenario by offering solutions like Pure Storage Cloud Dedicated Cloud Block Store, which helps bring data portability and reduce the gravitational load often experienced in standard cloud storage environments. Pure Storage also provides a number of other features, such as snapshots for fast checkpointing during AI training and SafeMode™ to protect data from potentially hostile actions.

Pure Fusion™ serves as an intelligent control plane, automatically registering FlashArray™ systems in security workflows and enabling policy enforcement across all AI infrastructure. This creates seamless governance that maintains consistent oversight regardless of whether AI workloads run in public, private, or on-premises environments.

Pure Protect™ recovery zones automatically provision isolated recovery environments for non-disruptive testing and validation of AI applications and data sets, ensuring seamless continuity. This enables organizations to remediate and recover from malicious attacks without impacting production AI workloads, providing immediate restoration capabilities for mission-critical AI factories.

With our extensive native threat detection and partnerships with CrowdStrike, Veeam, and Superna, we have an extended threat detection network that shares bi-directional threat signals across the entire AI data pipeline. We also built our own Threat Model Mentor GPT at Pure Storage, an AI-powered tool that automates threat modeling to help democratize cybersecurity expertise.

The NVIDIA AI Factory Security Foundation

As an NVIDIA-Certified Storage Partner with both Foundation and Enterprise level validation, we meet the highest standards of quality, efficiency, and reliability for AI factories. The platform features TPM and UEFI secure boot capabilities, enterprise-grade identity and access management, and bring-your-own-key encryption for multi-tenant environments. This architecture supports NVIDIA’s Enterprise Reference Architectures with validated configurations from 4 to 1,024+ GPUs, and the unified software stack — combining NVIDIA Base Command Manager, NVIDIA AI Factories, Portworx® by Pure Storage, and NVIDIA Run:ai — creates policy-driven security orchestration across the entire AI pipeline.

Security-first storage for AI: The real silver bullet

As I stated earlier, AI has done it again: It’s exposed an architectural limitation to traditional storage and demanded something better. Even the organizations not investing in large-scale AI can recognize this important signal: AI won’t be the last big disruption, but they can weather it and be ready for whatever comes next.

The original article is here.

The views and opinions expressed in this article are those of the author and do not necessarily reflect those of CDOTrends. Image credit: iStockphoto/KanawatTH