Securing
AI Agents
// Threat landscape · Controls · Decision flows
Prompt Injection
Malicious instructions embedded in user input or retrieved content override the agent’s intended behavior.
Data Exfiltration
Agent is tricked into leaking sensitive data through tool calls, API requests, or generated responses.
Privilege Escalation
Agent gains or is granted permissions beyond what’s needed, enabling unauthorized system access.
Supply Chain Attack
Compromised tools, plugins, or MCP servers inject malicious behavior into the agent’s workflow.
Uncontrolled Actions
Agent takes irreversible real-world actions (send email, delete files, make payments) without verification.
Memory Poisoning
Malicious content is injected into the agent’s persistent memory, corrupting future sessions.
Input Sanitization
Strip or escape special tokens, system prompt delimiters, and injection patterns from all user input before passing to the model.
filter(input, patterns=["/ignore previous/", "DAN", "jailbreak"])
Least Privilege Tools
Grant agents only the minimum permissions needed per task. A research agent should never have write access to production systems.
agent.tools = ["read_db", "search_web"] # not write_db
Human-in-the-Loop
Require human approval for irreversible actions: sending emails, financial transactions, deleting data, or external API calls with side effects.
if action.is_irreversible: await human_approval(action)
Audit Logging
Log every tool call, model decision, and action with full context. Immutable logs enable forensic analysis after incidents.
log.write(ts, agent_id, action, inputs, outputs, user)
Sandboxed Execution
Run agent tool calls in isolated environments (containers, VMs) with network egress control and resource limits.
docker run --network=none --memory=512m agent_tool
Output Validation
Validate and filter agent outputs before delivery. Check for PII leakage, unexpected data formats, and policy violations.
output = redact_pii(validate_schema(agent.respond()))
Tool Supply Chain Verification
Cryptographically verify all MCP servers, plugins, and third-party tools. Pin versions and review changelogs before updates.
verify_signature(tool, pubkey=TRUSTED_KEYS[tool.vendor])
Memory Isolation
Scope agent memory per user and session. Prevent cross-user memory contamination and validate content before storing.
memory.save(key=f"{user_id}:{session_id}", val=safe(data))

