
Sponsored byProtegrity
The earlier piece in this series, “Rules fail at the prompt, succeed at the boundary,” discussed the inaugural AI-driven espionage initiative and the shortcomings of prompt-level management. This piece serves as the solution. The inquiry now posed to every CEO by their board is some variation of: What steps should we take regarding agent risk?

In recent AI security recommendations from authorities, regulators, and key vendors, a fundamental principle recurs: consider agents as formidable, semi-autonomous users, and enforce protocols at the borders where they interact with identity, tools, data, and results.
Here is a practical eight-step strategy for teams to execute and report on:

Limit abilities
These actions are designed to outline identity and restrict abilities.
1. Identity and scope: Transform agents into genuine users with defined roles
Currently, agents operate under ambiguous, overly permissive service identities. The correction is simple: consider each agent a non-human entity with the same rigor applied to employees.
Each agent ought to operate as the requesting user within the appropriate tenant, with rights confined to that user’s role and location. Prohibit cross-tenant on-behalf-of shortcuts. High-impact actions necessitate explicit human consent with a documented justification. This aligns with how Google’s Secure AI Framework (SAIF) and NIST AI’s access control recommendations are intended to work in real scenarios.
The CEO inquiry: Can we currently provide a detailed list of our agents and their specific permissions?
2. Tool control: Specify, approve, and limit the tools agents may use
The Anthropic espionage framework was effective because attackers could link Claude with a dynamic assortment of tools (e.g., scanners, exploit frameworks, data parsers) via Model Context Protocol, which were not restricted or governed by policy.
The safeguard is to treat toolchains like a supply chain:
- Lock versions of external tool servers.
- Obtain approvals for adding new tools, scopes, or data sources.
- Prohibit automatic tool-chaining unless explicitly permitted by a policy.
This directly corresponds to what OWASP highlights under excessive agency and what it advises safeguarding against. According to the EU AI Act, designing for such cyber-resilience and misuse defense is part of the Article 15 obligation for ensuring robustness and cybersecurity.
The CEO inquiry: Who authorizes when an agent acquires a new tool or expanded scope? How can this be tracked?
3. Permissions by design: Link tools to tasks, not to models
A prevalent poor practice is granting the model a long-term credential while hoping prompts maintain decorum. SAIF and NIST advocate for the contrary: credentials and scopes should be tied to tools and tasks, routinely rotated, and subject to auditing. Consequently, agents request narrowly defined abilities via those tools.
This can be exemplified as: “finance-ops-agent may read but cannot write certain ledgers without CFO authorization.”
The CEO inquiry: Can we revoke a specific capability from an agent without needing to redesign the entire system?
Regulate data and actions
These measures oversee inputs, outputs, and limit actions.
4. Inputs, memory, and RAG: View external content as adversarial until validated
The majority of agent-related issues arise from deceptive data: a malicious webpage, PDF, email, or repository that clandestinely introduces antagonistic instructions into the system. OWASP’s prompt-injection cheat sheet and OpenAI’s guidance underscore the necessity of distinguishing system instructions from user content and treating unverified retrieval sources as unreliable.
Operationally, a filter should be implemented before anything enters retrieval or long-term memory: new sources undergo review, tagging, and onboarding; persistent memory should be disabled when untrusted contexts are involved; lineage should be assigned to each data chunk.
The CEO inquiry: Can we list every external content source our agents learn from and who permitted them?
5. Output handling and execution: No execution “just because the model said so”
In the Anthropic scenario, AI-generated exploit code and credential leaks proceeded immediately into action. Any output capable of causing a side effect requires a validator between the agent and the external environment. OWASP’s insecure output handling category explicitly addresses this, as do security best practices for browsers regarding origin boundaries.
The CEO inquiry: Where in our architecture do we evaluate agent outputs before they execute or are dispatched to clients?
6. Data privacy during execution: Safeguard the data first, then the model
Secure the data such that nothing hazardous is exposed by default. NIST and SAIF favor “secure-by-default” methodologies where sensitive information is tokenized or masked and only revealed for authorized users and use cases.
In agentic environments, this involves policy-regulated detokenization at the output boundary and documenting every exposure. If an agent is completely compromised, the potential damage is limited to what the policy permits it to access.
This is where the AI infrastructure intersects not solely with the EU AI Act but also with GDPR and industry-specific regulations. The EU AI Act requires providers and operators to manage AI-related risks; runtime tokenization and policy-controlled revelations are strong indicators of active risk management in operational settings.
The CEO inquiry: When our agents engage with regulated data, is that security enforced through architecture or merely by promises?
Demonstrate governance and resilience
In the concluding steps, it’s essential to demonstrate that controls are effective and persist in their operation.
7. Ongoing assessment: Don’t deploy a one-time evaluation, deploy a testing framework
Anthropic’s research on sleeper agents should dispel any illusions about singular test scenarios and illustrate the importance of continual evaluation. This involves equipping agents with comprehensive observability, regularly simulating adversarial conditions with testing suites, and supporting everything with thorough logging and evidence, so failures can become both regression tests and enforceable updates to policy.
The CEO inquiry: Who actively attempts to undermine our agents weekly, and how do their findings influence policy?
8. Governance, inventory, and auditing: Maintain accountability in one centralized location
AI security protocols stress the importance of inventory and documentation: enterprises must be aware of which models, prompts, tools, datasets, and vector stores are in use, who is responsible for them, and what risk-related decisions have been made.
For agents, this translates to a dynamic catalog and unified records:
- Which agents are operational, and on which platforms
- What scopes, tools, and data each is permitted to utilize
- Every approval, detokenization, and significant action, along with who authorized it and the date
The CEO inquiry: If asked how an agent reached a specific decision, could we retrace the decision-making process?
And it’s crucial to remember the overarching threat model: consider the threat actor GTG-1002 as if they are already within your organization. For complete organizational readiness, broaden your perspective and evaluate the MITRE ATLAS product, which exists precisely because adversaries breach systems, not models. Anthropic offers a case study of a state-sponsored threat actor (GTG-1002) executing precisely that within an agentic framework.
In summary, these controls do not render agents inherently secure. They accomplish something more familiar and dependable: they reinstate AI, its access, and actions within the same security framework utilized for any potent user or system.
For boards and CEOs, the inquiry has shifted from “Are our AI safeguards adequate?” to: Can we address the CEO inquiries mentioned earlier with solid evidence, not mere reassurances?
This content was created by Protegrity. It was not authored by the editorial staff of MIT Technology Review.