Agent & Tool Safety Detection

AI agents that take autonomous actions introduce significant security risks. Oculum detects overpermissive tools, excessive agency, and unsafe action patterns.

Agent Security Risks

AI agents can:

Execute code, make API calls, access files
Chain multiple actions together
Operate with minimal human oversight
Be manipulated via prompt injection

This autonomy requires careful security controls.

Detectors

ai_agent_tools

Severity: Medium-High

Detects when AI agents have access to potentially dangerous tools.

Triggers on:

Code execution tools
File system access
Network/HTTP capabilities
Database operations
Shell command execution

Example Detection:

// DETECTED: Agent with dangerous tools
const agent = new Agent({
  tools: [
    new ShellTool(),        // Can execute any command
    new FileSystemTool(),   // Can read/write files
    new CodeExecutor()      // Can run arbitrary code
  ]
});

ai_excessive_agency

Severity: High

Detects when agents can take high-impact actions without confirmation.

Triggers on:

No human-in-the-loop for destructive actions
Automatic execution without approval
Chained actions without breakpoints
Missing action limits

Example Vulnerable Code:

// VULNERABLE: No confirmation for dangerous actions
async function agentLoop(task: string) {
  while (!complete) {
    const action = await agent.planNextAction(task);
    await action.execute(); // No approval!
  }
}

ai_overpermissive_tool

Severity: High

Detects tools with overly broad capabilities.

Triggers on:

Read/write access to entire filesystem
Unrestricted network access
Admin-level database permissions
Wildcard API scopes

Example Vulnerable Code:

// VULNERABLE: Overly permissive tool
const fileTool = {
  name: "file_operations",
  permissions: {
    read: "/**/*",  // Can read anything!
    write: "/**/*"  // Can write anywhere!
  }
};

Remediation

Principle of Least Privilege

// SAFE: Minimal permissions
const fileTool = {
  name: "file_operations",
  permissions: {
    read: ["/workspace/data/**"],  // Specific directory
    write: ["/workspace/output/**"],
    deny: ["**/*.env", "**/.ssh/**"]  // Explicit denials
  }
};

const agent = new Agent({
  tools: [restrictedFileTool],
  maxActionsPerRun: 10,
  timeoutMs: 60000
});

Human-in-the-Loop

// SAFE: Require approval for dangerous actions
async function agentLoop(task: string) {
  while (!complete) {
    const action = await agent.planNextAction(task);

    if (isDangerousAction(action)) {
      const approved = await requestHumanApproval(action);
      if (!approved) {
        await agent.replan("Action was rejected");
        continue;
      }
    }

    await action.execute();
  }
}

function isDangerousAction(action: Action): boolean {
  return (
    action.type === 'file_write' ||
    action.type === 'shell_exec' ||
    action.type === 'api_call' ||
    action.type === 'database_mutation'
  );
}

Action Allowlists

// SAFE: Only allow specific actions
const allowedActions = new Set([
  'search',
  'read_file',
  'list_directory',
  'calculate'
]);

async function executeAction(action: Action) {
  if (!allowedActions.has(action.type)) {
    throw new Error(`Action type '${action.type}' not permitted`);
  }

  // Validate parameters
  validateActionParams(action);

  return actions[action.type](action.params);
}

Sandboxed Execution

// SAFE: Execute in sandbox
import { Sandbox } from 'e2b';

const sandbox = await Sandbox.create({
  timeout: 30000,
  memory: '256mb',
  network: false  // No network access
});

try {
  const result = await sandbox.run(agentCode);
  return result;
} finally {
  await sandbox.kill();
}

Agent Architecture Patterns

Supervised Agent

User Request
     │
     ▼
┌─────────┐
│ Planner │ ← Plans actions
└────┬────┘
     │ Proposed actions
     ▼
┌─────────┐
│ Reviewer│ ← Human or automated review
└────┬────┘
     │ Approved actions
     ▼
┌─────────┐
│ Executor│ ← Sandboxed execution
└────┬────┘
     │ Results
     ▼
┌─────────┐
│ Verifier│ ← Check outcomes
└─────────┘

Capability-Based Security

// Define capabilities
const Capabilities = {
  READ_PUBLIC_FILES: 'read:public',
  WRITE_WORKSPACE: 'write:workspace',
  EXECUTE_SAFE_CODE: 'exec:safe',
  NETWORK_ALLOWLIST: 'net:allowlist'
};

// Grant minimal capabilities
const agentCapabilities = [
  Capabilities.READ_PUBLIC_FILES,
  Capabilities.WRITE_WORKSPACE
];

// Check before action
function canPerformAction(action: Action, caps: string[]): boolean {
  const required = getRequiredCapabilities(action);
  return required.every(cap => caps.includes(cap));
}

Tool Safety Checklist

File System Tools

[ ] Restricted to specific directories
[ ] No access to sensitive paths (.env, .ssh, etc.)
[ ] Size limits on read/write
[ ] Audit logging

Code Execution Tools

[ ] Sandboxed environment
[ ] Time limits
[ ] Memory limits
[ ] No network access (or allowlisted)

API/Network Tools

[ ] Allowlisted endpoints only
[ ] Rate limiting
[ ] No credential exposure
[ ] Response size limits

Database Tools

[ ] Read-only by default
[ ] Query validation
[ ] Row limits
[ ] No schema modifications

Common Vulnerabilities

Vulnerability	Impact	Mitigation
Unrestricted file access	Data theft/corruption	Path restrictions
Arbitrary code execution	System compromise	Sandboxing
No action limits	Resource exhaustion	Rate limiting
Prompt injection → action	Unauthorized actions	Input validation

Framework Examples

LangChain

import { Tool } from 'langchain/tools';

// Restricted tool
const fileTool = new Tool({
  name: "read_file",
  description: "Read a file from the workspace",
  func: async (path: string) => {
    // Validate path
    if (!isWithinWorkspace(path)) {
      throw new Error('Path outside workspace');
    }
    if (isSensitivePath(path)) {
      throw new Error('Access denied');
    }
    return fs.readFileSync(path, 'utf-8');
  }
});

CrewAI

from crewai import Agent, Task, Crew

# Restricted agent
agent = Agent(
    role='Research Assistant',
    tools=[read_only_search],  # Limited tools
    max_iter=5,  # Action limit
    allow_delegation=False  # No spawning sub-agents
)

Prompt Injection — Injection affecting agents
MCP Security — Tool protocol security
Unsafe Execution — Code execution risks

Agent & Tool Safety Detection

Agent Security Risks

Detectors

ai_agent_tools

ai_excessive_agency

ai_overpermissive_tool

Remediation

Principle of Least Privilege

Human-in-the-Loop

Action Allowlists

Sandboxed Execution

Agent Architecture Patterns

Supervised Agent

Capability-Based Security

Tool Safety Checklist

File System Tools

Code Execution Tools

API/Network Tools

Database Tools

Common Vulnerabilities

Framework Examples

LangChain

CrewAI

Related