Agent & Tool Safety Detection
AI agents that take autonomous actions introduce significant security risks. Oculum detects overpermissive tools, excessive agency, and unsafe action patterns.
Agent Security Risks
AI agents can:
- Execute code, make API calls, access files
- Chain multiple actions together
- Operate with minimal human oversight
- Be manipulated via prompt injection
This autonomy requires careful security controls.
Detectors
ai_agent_tools
Severity: Medium-High
Detects when AI agents have access to potentially dangerous tools.
Triggers on:
- Code execution tools
- File system access
- Network/HTTP capabilities
- Database operations
- Shell command execution
Example Detection:
// DETECTED: Agent with dangerous tools
const agent = new Agent({
tools: [
new ShellTool(), // Can execute any command
new FileSystemTool(), // Can read/write files
new CodeExecutor() // Can run arbitrary code
]
});
ai_excessive_agency
Severity: High
Detects when agents can take high-impact actions without confirmation.
Triggers on:
- No human-in-the-loop for destructive actions
- Automatic execution without approval
- Chained actions without breakpoints
- Missing action limits
Example Vulnerable Code:
// VULNERABLE: No confirmation for dangerous actions
async function agentLoop(task: string) {
while (!complete) {
const action = await agent.planNextAction(task);
await action.execute(); // No approval!
}
}
ai_overpermissive_tool
Severity: High
Detects tools with overly broad capabilities.
Triggers on:
- Read/write access to entire filesystem
- Unrestricted network access
- Admin-level database permissions
- Wildcard API scopes
Example Vulnerable Code:
// VULNERABLE: Overly permissive tool
const fileTool = {
name: "file_operations",
permissions: {
read: "/**/*", // Can read anything!
write: "/**/*" // Can write anywhere!
}
};
Remediation
Principle of Least Privilege
// SAFE: Minimal permissions
const fileTool = {
name: "file_operations",
permissions: {
read: ["/workspace/data/**"], // Specific directory
write: ["/workspace/output/**"],
deny: ["**/*.env", "**/.ssh/**"] // Explicit denials
}
};
const agent = new Agent({
tools: [restrictedFileTool],
maxActionsPerRun: 10,
timeoutMs: 60000
});
Human-in-the-Loop
// SAFE: Require approval for dangerous actions
async function agentLoop(task: string) {
while (!complete) {
const action = await agent.planNextAction(task);
if (isDangerousAction(action)) {
const approved = await requestHumanApproval(action);
if (!approved) {
await agent.replan("Action was rejected");
continue;
}
}
await action.execute();
}
}
function isDangerousAction(action: Action): boolean {
return (
action.type === 'file_write' ||
action.type === 'shell_exec' ||
action.type === 'api_call' ||
action.type === 'database_mutation'
);
}
Action Allowlists
// SAFE: Only allow specific actions
const allowedActions = new Set([
'search',
'read_file',
'list_directory',
'calculate'
]);
async function executeAction(action: Action) {
if (!allowedActions.has(action.type)) {
throw new Error(`Action type '${action.type}' not permitted`);
}
// Validate parameters
validateActionParams(action);
return actions[action.type](action.params);
}
Sandboxed Execution
// SAFE: Execute in sandbox
import { Sandbox } from 'e2b';
const sandbox = await Sandbox.create({
timeout: 30000,
memory: '256mb',
network: false // No network access
});
try {
const result = await sandbox.run(agentCode);
return result;
} finally {
await sandbox.kill();
}
Agent Architecture Patterns
Supervised Agent
User Request
│
▼
┌─────────┐
│ Planner │ ← Plans actions
└────┬────┘
│ Proposed actions
▼
┌─────────┐
│ Reviewer│ ← Human or automated review
└────┬────┘
│ Approved actions
▼
┌─────────┐
│ Executor│ ← Sandboxed execution
└────┬────┘
│ Results
▼
┌─────────┐
│ Verifier│ ← Check outcomes
└─────────┘
Capability-Based Security
// Define capabilities
const Capabilities = {
READ_PUBLIC_FILES: 'read:public',
WRITE_WORKSPACE: 'write:workspace',
EXECUTE_SAFE_CODE: 'exec:safe',
NETWORK_ALLOWLIST: 'net:allowlist'
};
// Grant minimal capabilities
const agentCapabilities = [
Capabilities.READ_PUBLIC_FILES,
Capabilities.WRITE_WORKSPACE
];
// Check before action
function canPerformAction(action: Action, caps: string[]): boolean {
const required = getRequiredCapabilities(action);
return required.every(cap => caps.includes(cap));
}
Tool Safety Checklist
File System Tools
- [ ] Restricted to specific directories
- [ ] No access to sensitive paths (
.env,.ssh, etc.) - [ ] Size limits on read/write
- [ ] Audit logging
Code Execution Tools
- [ ] Sandboxed environment
- [ ] Time limits
- [ ] Memory limits
- [ ] No network access (or allowlisted)
API/Network Tools
- [ ] Allowlisted endpoints only
- [ ] Rate limiting
- [ ] No credential exposure
- [ ] Response size limits
Database Tools
- [ ] Read-only by default
- [ ] Query validation
- [ ] Row limits
- [ ] No schema modifications
Common Vulnerabilities
| Vulnerability | Impact | Mitigation |
|---|---|---|
| Unrestricted file access | Data theft/corruption | Path restrictions |
| Arbitrary code execution | System compromise | Sandboxing |
| No action limits | Resource exhaustion | Rate limiting |
| Prompt injection → action | Unauthorized actions | Input validation |
Framework Examples
LangChain
import { Tool } from 'langchain/tools';
// Restricted tool
const fileTool = new Tool({
name: "read_file",
description: "Read a file from the workspace",
func: async (path: string) => {
// Validate path
if (!isWithinWorkspace(path)) {
throw new Error('Path outside workspace');
}
if (isSensitivePath(path)) {
throw new Error('Access denied');
}
return fs.readFileSync(path, 'utf-8');
}
});
CrewAI
from crewai import Agent, Task, Crew
# Restricted agent
agent = Agent(
role='Research Assistant',
tools=[read_only_search], # Limited tools
max_iter=5, # Action limit
allow_delegation=False # No spawning sub-agents
)
Related
- Prompt Injection — Injection affecting agents
- MCP Security — Tool protocol security
- Unsafe Execution — Code execution risks