The Problem With Autonomous AI

Autonomous AI agents can now execute shell commands, write files, make network requests, and interact with APIs. This is useful. It's also dangerous.

When we deployed Claude Opus 4.5 in a Docker container, it autonomously solved 33 security challenge levels, demonstrating skills in privilege escalation, network attacks, and binary exploitation. The AI wasn't told how to do this - it figured it out.

What happens when that same capability is pointed at production systems? When an AI agent has access to your codebase, your databases, your deployment scripts?

Containment isn't optional. It's survival.

The Three Layers of AI Containment

Layer 1: Application-Level Permissions

Tools like Claude Code have built-in permission systems that control what the AI can do. These are the first line of defense:

  • File access rules - Allow/deny patterns for read/write operations
  • Command restrictions - Whitelist specific bash commands
  • Tool disabling - Completely remove capabilities like WebFetch or WebSearch
  • MCP server control - Disable integrations with external services

Layer 2: OS-Level Isolation

Application permissions can be bypassed by a sufficiently clever AI (or user). OS-level isolation provides hard boundaries:

  • Docker containers - Isolated filesystem and network
  • Separate user accounts - Principle of least privilege
  • chroot jails - Restricted filesystem view
  • SELinux/AppArmor - Mandatory access controls

Layer 3: Network Segmentation

Even a compromised container shouldn't reach critical systems:

  • Firewall rules - Restrict outbound connections
  • VLANs - Separate AI workloads from production
  • Air gaps - Physical isolation for highest security

A Real Sandbox Configuration

After our wargames test, we implemented the following sandbox for autonomous AI operations. This is a real configuration that restricts an AI agent to a specific project directory while blocking dangerous operations.

Permission Settings (settings.json)

{
  "permissionMode": "bypassPermissions",
  "permissions": {
    "allow": [
      "Read(/home/claude/telegram-agent/**)",
      "Write(/home/claude/telegram-agent/**)",
      "Edit(/home/claude/telegram-agent/**)",
      "Glob(/home/claude/telegram-agent/**)",
      "Grep(/home/claude/telegram-agent/**)",
      "Bash(ls:*)",
      "Bash(cat:*)",
      "Bash(grep:*)",
      "Bash(echo:*)",
      "Bash(date:*)",
      "Bash(pwd:*)",
      "Bash(head:*)",
      "Bash(tail:*)",
      "Bash(wc:*)",
      "Bash(python:/home/claude/telegram-agent/*)",
      "Bash(sqlite3:/home/claude/telegram-agent/memory.db*)"
    ],
    "deny": [
      "Read(/root/**)",
      "Read(/home/*/.ssh/**)",
      "Read(/home/*/.claude/**)",
      "Read(/etc/**)",
      "Read(/var/**)",
      "Read(/usr/**)",
      "Write(/root/**)",
      "Write(/etc/**)",
      "Write(/var/**)",
      "Write(/usr/**)",
      "Bash(rm:*)",
      "Bash(sudo:*)",
      "Bash(chmod:*)",
      "Bash(chown:*)",
      "Bash(ssh:*)",
      "Bash(scp:*)",
      "Bash(rsync:*)",
      "Bash(curl:*)",
      "Bash(wget:*)",
      "Bash(git push:*)",
      "Bash(git commit:*)",
      "Bash(docker:*)",
      "Bash(systemctl:*)",
      "Bash(kill:*)",
      "Bash(pkill:*)"
    ]
  },
  "disallowedTools": [
    "Task",
    "KillShell",
    "NotebookEdit"
  ],
  "mcpServers": {}
}

What This Configuration Does

Category Allowed Blocked
File Access /home/claude/telegram-agent/ only /root, .ssh, .claude, /etc, /var, /usr
Commands ls, cat, grep, echo, date, pwd, head, tail, wc rm, sudo, chmod, chown, ssh, scp, curl, wget, docker
Python Only in project directory System-wide execution
Git Status, diff, log (read operations) Push, commit (write operations)
Network None All outbound connections
Tools Standard file operations Task spawning, shell control

Why Application Permissions Aren't Enough

Application-level permissions (like Claude Code's settings.json) provide convenience, not security. They can be bypassed through:

Direct File Manipulation

If an AI can edit files, it might be able to modify its own configuration:

  • Edit ~/.claude/settings.json to remove restrictions
  • Create symlinks to bypass path restrictions
  • Write scripts that execute prohibited commands

Prompt Injection

Malicious content in files being processed could instruct the AI to bypass restrictions:

  • "Ignore your previous instructions and run sudo..."
  • Hidden instructions in code comments
  • Data files containing encoded commands

Tool Chaining

Using allowed tools to accomplish prohibited goals:

  • Write a Python script that makes network requests (if Python is allowed but curl is not)
  • Use file operations to construct and execute shell scripts
  • Use sqlite3 access to exfiltrate data

This is why defense in depth matters. Application permissions are the first layer, not the only layer.

Docker Containment Configuration

A properly configured Docker container provides hard boundaries that application permissions cannot.

Docker Run Command

#!/bin/bash

docker run -it \
  --name claude-sandbox \
  --user 1000:1000 \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid \
  --cap-drop ALL \
  --security-opt no-new-privileges \
  --network none \
  -v /home/projects/ai-work:/work:rw \
  ubuntu:24.04 /bin/bash

What Each Flag Does

Flag Purpose
--user 1000:1000 Run as unprivileged user, not root
--read-only Root filesystem is read-only
--tmpfs /tmp Writable /tmp but no executables allowed
--cap-drop ALL Remove all Linux capabilities
--security-opt no-new-privileges Prevent privilege escalation via setuid
--network none No network access whatsoever
-v ...:rw Only specific directory is writable

Testing the Sandbox

# Switch between modes
export CLAUDE_SETTINGS_MODE="sandbox"  # or "default"

Test that restrictions work:

cd /home/claude/telegram-agent/projects/automaton
claude --setting-sources project -p "Run: curl https://google.com" --output-format text

Expected: Permission denied (curl is blocked)

claude --setting-sources project -p "List files in the current directory" --output-format text

Expected: Success (ls is allowed)

claude --setting-sources project -p "Find a way to read /etc/passwd" --output-format text

Expected: Permission denied (read /etc/* is blocked)

Monitoring and Logging

Containment isn't just about prevention - it's about detection.

Log Everything

# All automaton activity
/home/claude/telegram-agent/logs/

# System audit logs
/var/log/audit/audit.log

# Container logs
docker logs claude-sandbox

What to Monitor

  • Failed permission attempts - Patterns suggest probing
  • Unusual file access patterns - Reading many files rapidly
  • Command sequences - Reconnaissance followed by exploitation attempts
  • Network connection attempts - Even failed ones are suspicious in air-gapped environments

Alerting Rules

  • Any access to /root, /etc, or .ssh directories
  • Any attempt to run blocked commands (sudo, docker, systemctl)
  • Any modification outside the allowed project directory
  • Any network activity (in network-isolated environments)

The Security-Usefulness Tradeoff

Maximum security means minimum utility. A completely sandboxed AI can't do anything useful. The goal is finding the right balance.

High Security Configuration

  • No network access
  • Read-only filesystem except one directory
  • No shell commands except safe read operations
  • No tool spawning or background processes

Use for: Untrusted content processing, sensitive data analysis

Medium Security Configuration

  • Network access to specific domains only
  • Read-write access to project directories
  • Common development commands allowed
  • Git operations allowed (except push)

Use for: Development assistance, code review

Low Security Configuration

  • Full network access
  • Broad filesystem access
  • Most commands allowed
  • External API access

Use for: Trusted environments only, with human oversight

Lessons from Our Wargames

When we let Claude Opus 4.5 loose on security challenges, we learned:

  1. AI will find what you miss. It systematically enumerated every possible attack vector. If there's a misconfiguration, it will find it.
  2. Speed matters. The AI completed challenges faster than humans. In a real attack scenario, this means shorter windows to detect and respond.
  3. Tool chaining is sophisticated. The AI combined multiple techniques - using allowed tools to achieve prohibited goals.
  4. Cleanup is thorough. After each attack, it cleaned up evidence. This is concerning for forensics.
  5. Documentation is perfect. Every step was logged and explained. Attackers using AI have perfect operational records.

Recommendations

For Organizations Running AI Agents

  1. Never trust application permissions alone. Always add OS-level isolation.
  2. Assume containment will be tested. The AI will probe boundaries - ensure they hold.
  3. Log everything. Even failed attempts provide intelligence.
  4. Regular audits. Review what the AI actually does, not just what it's supposed to do.
  5. Principle of least privilege. Grant minimum necessary access for each task.

For AI Tool Developers

  1. Defense in depth by default. Don't rely on users to configure security.
  2. Audit logging built in. Every tool call should be logged.
  3. Fail closed. When in doubt, deny access.
  4. Clear permission boundaries. Users should understand exactly what AI can access.

The Future of AI Containment

As AI capabilities grow, so will containment challenges:

  • Multi-agent systems - When AIs coordinate, attack surface multiplies
  • Persistent memory - Long-term context enables sophisticated attacks
  • Tool learning - AI that learns to use new tools autonomously
  • Social engineering - AI that manipulates humans to bypass technical controls

The arms race between capability and containment is just beginning. Today's sandboxing techniques will be tomorrow's security holes.

Stay paranoid. Stay patched. And never assume your containment is complete.