When AI agents act, security has to keep up

The organizations that get the most from agentic AI will be those that understand the threat model clearly enough to design against it.

Rob Smith

April 29, 2026 2:15 pm

4 min read

Today’s AI assistants find answers. AI agents take action. They can already autonomously read codebases, write and edit files, run tests and fix bugs from a single prompt, and before long, they’ll handle everything from booking business travel to processing procurement requests, using your credentials to get it done.

That’s powerful. It’s also a significant responsibility. The Center for AI Standards and Innovation, an arm of the National Institute of Standards and Technology (NIST), has become concerned enough about the risks of agentic AI to begin gathering information on how to track the development and deployment of these tools.

“AI agent systems are capable of taking autonomous actions that impact real-world systems or environments, and may be susceptible to hijacking, backdoor attacks, and other exploits,” NIST noted in a

IT leaders need to understand not just what agents can do, but what that capability means for their security posture. Agentic AI reshapes and expands the attack surface, creating far greater potential for vulnerabilities to emerge, including interactions between agents that traditional security models weren’t built to catch.

]]>

New capabilities, new challenges

The nature of large language models — and agentic AI in particular — creates a variety of security challenges; some are entirely new, others are twists on long-standing issues.

One of the biggest risks combines a core strength of AI bots — the ability to process natural language — with agentic AI’s instruction-following behavior. The combination enables “prompt injection” attacks, where malicious instructions embedded in otherwise legitimate content or requests can manipulate agent behavior. The probabilistic nature of LLMs compounds this issue: the same prompt-injection attack may succeed or fail on different attempts, making defenses difficult to validate comprehensively.

A particular challenge also arises from the combination of capabilities within a single agent. AI agents merge language model reasoning with tool access — the ability to read files, query databases, call application programming interfaces (APIs), execute code and interact with external services. The risks emerge not from any single capability, but from their combination and an agent’s ability to execute these actions autonomously.

An agent with access to private data, exposure to untrusted content and the ability to communicate externally presents a materially different risk profile than one lacking any of these three elements. Some observers have described this combination as the “

Other risks include:

Unintended operations, where agents execute actions beyond their intended scope due to misinterpreted instructions or prompt manipulation.
Privilege escalation, where agents operating with broad permissions may perform sensitive operations that exceed what the initiating user intended.
Cascading failures, where one compromised agent in a multi-agent system can corrupt others downstream.

Addressing the risks

IT leaders and other federal workers can take concrete steps to mitigate these risks. Effective security requires layered controls at three levels.

Model level: Maintain clear separation between system instructions and untrusted content using distinct messaging roles and randomized delimiters. Secondary classifiers provide an additional layer, scanning inputs and outputs for injection patterns and anomalous formatting. These are risk-reduction measures rather than complete solutions, which is precisely why the layers below matter.
System level: Apply least privilege across the board. Agents should only access the tools required for their tasks, with credentials narrowly scoped and set to expire quickly. Validate content entering the system for injection patterns and screen content leaving it for sensitive information, like credentials or personally identifiable information. Enforce default-deny network controls, limiting external communication to explicitly approved endpoints. And design workflows to break the “lethal trifecta:” separating read-only and write-capable agents ensures no single agent can access sensitive data, process untrusted content and communicate externally all at once.
Human oversight level: Require explicit approval for critical operations while allowing lower-risk actions to proceed with notification. Tiering your approach prevents the approval fatigue that leads to bypassed oversight. Users should be able to halt execution at any time, with rollback of partially completed work where possible. When an agent acts on behalf of a user, record both identities and evaluate permissions at their intersection. Log all agent actions, timestamps, identifiers, tools invoked, resources accessed and outcomes in sufficient detail to reconstruct events after the fact.

The opportunity outweighs the risks

The risks are real, but so is the opportunity, and it would be a mistake to let one obscure the other.

]]>

Consider what agents look like when working for you rather than against you. The right combination of data access, content processing and external communication, when properly governed, is exactly what makes agents powerful tools. AI agents can monitor systems, apply consistent security rules without fatigue and respond to threats at a speed and scale no manual process can match. They’re a force multiplier.

Human security teams will always be necessary. But teams that deploy agents as a defensive tool will have a meaningful advantage over those that don’t: faster detection, faster remediation and fewer of the human errors that attackers count on.

The organizations that get the most from agentic AI will be those that understand the threat model clearly enough to design against it.

Rob Smith is public sector area vice president at GitLab.

When AI agents act, security has to keep up

Tags: