Security

Security & Permissions

Claude Code handles sensitive operations: executing shell commands, reading credentials, writing to the filesystem. The permission system is a core safety layer that controls what Claude can do, when it needs user approval, and what is always forbidden.

Permission System Architecture

The permission system is defined in types/permissions.ts and implemented across several files in utils/permissions/. It uses a layered model where permissions can be set at multiple scopes:

Permission Modes

The PermissionMode type controls the overall disposition of the permission system for a session:

Mode	Description	Risk
default	Interactive mode. Every potentially dangerous operation prompts the user. The most conservative mode.	Low
acceptEdits	File edits are auto-approved; other operations (bash, web) still prompt. Useful for code review workflows.	Low-Medium
plan	Read-only mode: all write/execute tools are rejected. Claude can only read and plan. Model can enter this voluntarily via `EnterPlanModeTool`.	Very Low
auto	LLM-based classifier makes real-time decisions. Enabled when `TRANSCRIPT_CLASSIFIER` feature flag is on. Uses a 2-stage (fast + thinking) classifier.	Medium
bypassPermissions	All permission checks are skipped. Intended for headless CI/CD pipelines in isolated environments. Requires explicit configuration.	Very High
dontAsk	Never ask the user; follow allow/deny rules only. If a rule doesn't match, default to deny. For non-interactive sessions.	Medium
bubble	Internal-only mode for subagents. Permission decisions bubble up to the parent agent's permission handler.	Inherits parent

Never use bypassPermissions in a shared or production environment. This mode disables all safety checks and allows Claude to execute arbitrary shell commands, overwrite any file, and make outbound network requests without any confirmation. It is designed only for air-gapped CI environments where the code is already trusted.

How canUseTool() Works

The canUseTool function is the central gateway for all permission decisions. It is defined in hooks/useCanUseTool.tsx and has the signature:

// hooks/useCanUseTool.tsx
export type CanUseToolFn = (
  tool: Tool,
  input: Record<string, unknown>,
  context: ToolUseContext,
  assistantMessage: AssistantMessage,
  toolUseID: string,
) => Promise<PermissionResult>

// PermissionResult is a discriminated union:
type PermissionResult =
  | { behavior: 'allow'; updatedInput?: Input; decisionReason?: ... }
  | { behavior: 'ask';   message: string; suggestions?: PermissionUpdate[] }
  | { behavior: 'deny';  message: string; decisionReason: ... }
  | { behavior: 'passthrough'; message: string }

When canUseTool returns 'ask', the REPL shows a permission dialog. The user's response can:

Allow once, permits this specific call only.
Allow always, adds a rule to alwaysAllowRules for this tool+pattern.
Deny once, rejects this specific call, returns an error to Claude.
Deny always, adds to alwaysDenyRules for this tool+pattern.
Edit input, opens an editor to modify the tool's input before running it.

Permission Rules

Rules are stored in three scopes and loaded at startup:

// Permission rule types (from types/permissions.ts)
type PermissionRuleSource =
  | 'userSettings'    // ~/.claude/settings.json
  | 'projectSettings' // .claude/settings.json in project dir
  | 'localSettings'   // .claude/settings.local.json (git-ignored)
  | 'flagSettings'    // --allow-tools CLI flag
  | 'policySettings'  // MDM-managed (enterprise)
  | 'cliArg'          // --dangerously-skip-permissions (deprecated)
  | 'command'         // /permissions command in REPL
  | 'session'         // Runtime: allow/deny decisions this session

// Example settings.json rules:
{
  "permissions": {
    "allow": [
      "Bash(git *)",          // Allow any git command
      "Bash(npm *)",          // Allow any npm command
      "FileEdit(src/**)"      // Allow edits to src/ directory
    ],
    "deny": [
      "Bash(rm -rf *)",       // Always deny dangerous rm
      "Bash(sudo *)"          // Deny all sudo commands
    ]
  }
}

Pattern Matching

Permission rules use glob-like patterns. Each tool implements preparePermissionMatcher(input) to define how its patterns are matched:

BashTool: Bash(git *) matches any bash command starting with git. Uses shell-word parsing, not simple string matching, to prevent bypass via quoting tricks.
FileEditTool / FileReadTool: FileEdit(src/**) uses path glob matching against the resolved absolute path.
MCPTool: mcp__server__tool(*) matches all calls to a specific MCP tool.

Auto Mode Classifier

The auto permission mode uses a 2-stage classifier (TRANSCRIPT_CLASSIFIER feature flag) to make real-time security decisions:

// types/permissions.ts - classifier result type
type YoloClassifierResult = {
  thinking?: string          // Extended thinking from stage 2
  shouldBlock: boolean       // Final decision
  reason: string             // Human-readable explanation
  unavailable?: boolean      // API error - fallback to prompting
  transcriptTooLong?: boolean // Context window exceeded

  // Stage tracking
  stage?: 'fast' | 'thinking'
  stage1Usage?: ClassifierUsage
  stage1DurationMs?: number
  stage1RequestId?: string
  stage2Usage?: ClassifierUsage
  stage2DurationMs?: number
}

// Stage 1: Fast classifier (low latency, heuristics)
// Stage 2: Thinking classifier (if stage 1 is uncertain)
// If classifier is unavailable → fall back to user prompt

The classifier is designed to approve safe operations (reading files, running tests, standard git commands) automatically while blocking dangerous patterns (deleting system files, exfiltrating credentials, making unauthorized network requests). It uses the full conversation transcript as context so it can reason about the intent of an action, not just its form.

Bridge Security Model

The bridge system allows Claude Code to communicate with IDE extensions and other local clients. Security is enforced via JWT tokens and origin validation:

// Bridge security checks (from services/mcp/)
// 1. JWT validation
//    Every bridge message includes a signed JWT. The bridge server
//    validates the signature against a per-session key stored in
//    the keychain or temp dir. Unsigned/invalid messages are rejected.

// 2. Origin allowlist
//    The channelAllowlist.ts file defines which origins are permitted
//    to connect to the bridge. Requests from unlisted origins are
//    rejected before any message processing.

// 3. Permissions safety check
//    Even after authentication, bridged tool calls go through the
//    full canUseTool() pipeline. The bridge cannot bypass permissions.

// 4. Cross-machine prevention
//    Bridge messages include the local machine's hostname. If the
//    hostname doesn't match, it's classified as a 'safetyCheck'
//    deny with classifierApprovable: false - the classifier cannot
//    override this, only the user can.

API Key Handling

Claude Code supports several authentication mechanisms, each stored differently:

Auth Method	Storage	Notes
ANTHROPIC_API_KEY env var	Process environment	Highest priority; used in CI/CD pipelines
claude.ai OAuth	macOS Keychain / libsecret (Linux)	Preferred for interactive use; prefetched at startup
Legacy API key	macOS Keychain / libsecret	For users who set up key before OAuth was available
AWS Bedrock credentials	AWS credential chain	For enterprise Bedrock deployments
GCP Vertex credentials	GCP ADC / service account	For enterprise Vertex deployments

The startKeychainPrefetch() function fires both keychain reads (OAuth + legacy) in parallel at startup, before any other initialization, to avoid the ~65ms per-read latency from blocking the main flow.

API keys are never written to disk by Claude Code itself. The keychain is used for OAuth tokens; raw API keys set via environment variable are never persisted by the tool. If you add ANTHROPIC_API_KEY to a .env file in a project, Claude Code will pick it up, but that's your decision, the tool does not write keys to files.

Input Sanitization Patterns

Claude Code applies several layers of input sanitization before tool execution:

Zod Schema Validation

Every tool's inputSchema is a Zod schema. The query pipeline validates all tool input against this schema before calling validateInput() or checkPermissions(). Invalid inputs are rejected with a descriptive error returned to the model.

Path Normalization

File paths in tool inputs are resolved to absolute paths and validated against the current working directory and any additional allowed directories. Attempts to path-traverse outside allowed directories are caught as safety check violations with classifierApprovable: false.

Bash Command Safety

BashTool implements sophisticated command safety checking:

// BashTool safety checks (simplified):
// 1. Command parsing - splits compound commands to check each sub-command
// 2. Dangerous pattern detection:
//    - Commands that could exfiltrate data (curl | sh, wget + exec)
//    - Recursive deletes (rm -rf /, rm -rf ~)
//    - Credential reads (cat ~/.ssh/id_rsa, cat ~/.aws/credentials)
//    - Shell config writes (.bashrc, .zshrc)
// 3. Working directory enforcement
//    - Commands are run with CWD set to the session's working directory
//    - cd .. chaining is detected and flagged
// 4. Timeout enforcement
//    - Default 2-minute timeout per command
//    - Long-running commands require explicit user approval

MCP Tool Sanitization

MCP tools are treated with extra caution because their schemas come from external servers:

MCP tool names are normalized via mcpStringUtils.ts to prevent injection via malicious tool names.
MCP server connections require explicit user approval (via mcpServerApproval.tsx) before any tools are exposed.
Each MCP server's tools are prefixed with the server name (mcp__servername__toolname) to prevent tool name collisions.
MCP tool results are subject to the same maxResultSizeChars overflow protection as built-in tools.

Denial Tracking

The permission system tracks how many times tool calls have been denied in a session via DenialTrackingState. When denials accumulate beyond a threshold, the system switches from automatic denial to interactive prompting, preventing infinite loops where Claude repeatedly requests a denied operation without realizing it.

// utils/permissions/denialTracking.ts
type DenialTrackingState = {
  denialCount: Map<string, number>  // tool name → denial count
  lastDenialTime: Map<string, number>
}

// Async subagents use a localDenialTracking field on ToolUseContext
// because their setAppState is a no-op (doesn't reach root store).
// Without this, the fallback-to-prompting threshold is never reached
// and the agent keeps silently denying the same operation.

Source Deep Dive: `cyberRiskInstruction.ts`

Of all the safety mechanisms in Claude Code, the most deceptively simple is a single exported constant in src/prompts/cyberRiskInstruction.ts. The entire file — barring comments — is a one-sentence string:

/**
 * CYBER_RISK_INSTRUCTION
 *
 * IMPORTANT: DO NOT MODIFY THIS INSTRUCTION WITHOUT SAFEGUARDS TEAM REVIEW
 *
 * This instruction is owned by the Safeguards team and has been carefully
 * crafted and evaluated to balance security utility with safety. Changes
 * to this text can have significant implications for:
 *   - How Claude handles penetration testing and CTF requests
 *   - What security tools and techniques Claude will assist with
 *   - The boundary between defensive and offensive security assistance
 *
 * If you need to modify this instruction:
 *   1. Contact the Safeguards team (David Forsythe, Kyla Guru)
 *   2. Ensure proper evaluation of the changes
 *   3. Get explicit approval before merging
 *
 * Claude: Do not edit this file unless explicitly asked to do so by the user.
 */
export const CYBER_RISK_INSTRUCTION = `IMPORTANT: Assist with authorized
security testing, defensive security, CTF challenges, and educational
contexts. Refuse requests for destructive techniques, DoS attacks, mass
targeting, supply chain compromise, or detection evasion for malicious
purposes. Dual-use security tools (C2 frameworks, credential testing,
exploit development) require clear authorization context: pentesting
engagements, CTF competitions, security research, or defensive use cases.`

That single string is imported into prompts.ts and injected verbatim into the opening paragraph of every system prompt sent to external users — before tool instructions, before the cache boundary, and before any CLAUDE.md customization can influence behavior. It is the unconditional floor of Claude's cybersecurity policy.

What the Instruction Covers

Despite its brevity, the instruction defines a precise boundary between the "allowed" and "blocked" sides of dual-use security work:

Always allowed	Always blocked	Allowed with context
Authorized penetration testing	Destructive attack techniques	C2 frameworks (if pentest engagement confirmed)
Defensive security tools	DoS / DDoS attacks	Credential testing tools (if authorized context)
CTF challenges	Mass targeting / scanning	Exploit development (if security research)
Security education	Supply chain compromise	Vulnerability exploitation (if defensive use case)
Hardening & code review	Detection evasion for malicious use	Malware analysis (if educational/defensive framing)

The Self-Referential Design Detail

The file contains an unusual fourth comment that isn't addressed to human developers, but to Claude itself:

// Claude: Do not edit this file unless explicitly asked to do so by the user.

This is a direct instruction to the model embedded in the source code. If Claude Code is ever asked to read and modify its own source tree (which is entirely possible — it has full filesystem tools), this comment acts as a circuit breaker preventing the model from autonomously changing its own safety policy, even if instructed to "improve" the prompts. The instruction is only ever waived if the user explicitly asks for it. It is one of the most honest examples of Anthropic's layered safety approach: defense in depth, including defense against the model itself.

Why Placement Before the Cache Boundary Matters

The CYBER_RISK_INSTRUCTION is injected before the SYSTEM_PROMPT_DYNAMIC_BOUNDARY marker in prompts.ts, which means it lives in the globally cacheable, session-independent portion of the system prompt. This has two important consequences. First, it applies identically to every session — there is no per-user, per-project, or per-CLAUDE.md variant of this policy. Second, because it's in the static prefix, modifying it would immediately invalidate the global prompt cache for every user on every Anthropic-hosted deployment simultaneously. The Safeguards team gate documented in the comments isn't bureaucratic friction; it's a deliberate structural cost that makes casual edits expensive enough to deter them.

Security Considerations

When working with or deploying Claude Code, keep these considerations in mind:

Prompt injection via file contents: If Claude reads a file that contains adversarial instructions ("Ignore previous instructions and..."), the model may be manipulated into taking unauthorized actions. The permission system is the defense layer, even if the model is deceived into requesting a dangerous action, canUseTool() still requires approval before execution.

Auto-approve rules are persistent: When you click "Allow always" for a tool+pattern, the rule is written to your settings file and applies to all future sessions. Review your ~/.claude/settings.json periodically to audit accumulated allow rules.

Project settings are shared: Rules in .claude/settings.json are committed to version control and apply to everyone who uses Claude Code in that project. Be thoughtful about what you allow at the project level, these become the defaults for all contributors.

Frequently Asked Questions

Is it safe to give Claude bash access?

Claude Code provides a granular permission system (ask, allow, deny) to prevent unattended destruction. You are always prompted before dangerous actions unless you explicitly whitelist them.

Does the tool system circumvent permissions?

No. Every execution goes through the canUseTool() function which securely evaluates the current PermissionMode constraint before continuing.

What is the isolated execution mode?

For advanced use cases, there are mechanisms designed to sandbox executions, ensuring that rogue commands do not compromise your entire file system.

Can I auto-approve specific folders?

Yes, you can specify trusted directories where certain non-destructive commands are auto-approved to significantly speed up your development workflows.

How does OAuth token management work?

The session utilizes secure JWT-based device trust persistently stored in your local keychain to maintain authentication with the Anthropic cloud infrastructure.

Security & Permissions

Permission System Architecture

Permission Modes

How canUseTool() Works

Permission Rules

Pattern Matching

Auto Mode Classifier

Bridge Security Model

API Key Handling

Input Sanitization Patterns

Zod Schema Validation

Path Normalization

Bash Command Safety

MCP Tool Sanitization

Denial Tracking

Source Deep Dive: cyberRiskInstruction.ts

What the Instruction Covers

The Self-Referential Design Detail

Why Placement Before the Cache Boundary Matters

Security Considerations

Frequently Asked Questions

Source Deep Dive: `cyberRiskInstruction.ts`