The Tools System: Teaching Claude How to Touch the World

If the query engine is Claude Code's brain, tools are its hands. An LLM on its own is just a text generator confined to a vacuum. To interact with your file system, run scripts, or connect to the internet, it needs tools. Let's dive into exactly how Anthropic engineered an incredibly robust, deeply typed, and dynamically loaded tool ecosystem for Claude.

The Root: Tool.ts Interface

If you have ever built a simple GPT wrapper, you might have just defined tools as a massive JSON array of schemas. Anthropic took a much more sophisticated, object-oriented approach. In claude-code-src/Tool.ts, every single tool implements a generic interface: Tool<Input, Output, Progress>.

Why this complexity? Because in a terminal application, a tool is not just a schema. It needs to know how to validate inputs, check user permissions before doing something dangerous, render progress spinners using React (Ink), and serialize its output cleanly back to the API. Tying all these concerns perfectly into one single object eliminates an enormous amount of boilerplate and hard-to-track UI bugs.

// Tool.ts  - The architecture of a tool (from the actual source)
export type Tool<
  Input extends AnyObject = AnyObject,
  Output = unknown,
  P extends ToolProgressData = ToolProgressData,
> = {
  readonly name: string
  readonly inputSchema: Input

  // The actual action takes place here:
  call(args, context, canUseTool, parentMessage, onProgress?): Promise<ToolResult<Output>>

  // Crucial Security Flags:
  isConcurrencySafe(input): boolean   // Can multiple instances run at once?
  isReadOnly(input): boolean          // Does it change anything on disk?
  isDestructive?(input): boolean      // Could it cause irreversible damage?
  
  // Terminal React UI (Ink):
  renderToolUseMessage(input, options): React.ReactNode
  renderToolResultMessage?(output, progressMessages, options): React.ReactNode

  // Memory bounds:
  maxResultSizeChars: number          // When to spill output to a tmp file
  readonly shouldDefer?: boolean      // Hide from initial prompt to save tokens
}

Why does every tool have render methods? Claude Code is not a web app. Its UI runs entirely inside your terminal using Ink (React for CLIs). When Claude calls BashTool, you see a nicely formatted command block with a spinner. That spinner, the progress bar, the diff preview for file edits: all of it is rendered by the tool's own renderToolUseMessage and renderToolResultMessage methods. The tool owns its entire visual lifecycle.

The Safety Net: buildTool() Factory

When you have dozens of tools, forgetting to implement a security check in just one of them can be disastrous. Imagine shipping a tool that accidentally defaults to "no permission needed" and lets Claude run destructive commands silently. To prevent this, Anthropic routes every single tool through the buildTool() factory function.

If a developer creates a new tool but forgets to specify whether it writes to disk, buildTool steps in and applies fail-closed defaults. It assumes the tool is NOT concurrency safe and NOT read-only, which means the permission system will automatically flag it and force user approval. You have to explicitly opt out of safety checks, not opt into them. That is a brilliant design decision.

// These are the actual defaults from Tool.ts (line 757)
const TOOL_DEFAULTS = {
  isEnabled:          () => true,
  isConcurrencySafe:  (_input?) => false,   // Assume NOT safe (fail-closed)
  isReadOnly:         (_input?) => false,   // Assume it writes (fail-closed)
  isDestructive:      (_input?) => false,
  checkPermissions:   (input, _ctx?) =>
    Promise.resolve({ behavior: 'allow', updatedInput: input }),
  toAutoClassifierInput: (_input?) => '',   // Skip classifier for most tools
  userFacingName:     (_input?) => '',
}

// Every tool in the codebase uses this pattern:
export const GrepTool = buildTool({
  name: 'Grep',
  inputSchema: z.object({ pattern: z.string(), path: z.string() }),
  maxResultSizeChars: 50_000,
  isReadOnly: () => true,           // Override: grep is safe
  isConcurrencySafe: () => true,    // Override: multiple greps are fine
  async call(args) { /* ... */ },
  renderToolUseMessage(input) { /* ... */ },
})

The Roster: How tools.ts Assembles Everything

All 40+ tool implementations live in their own directories under tools/. The file tools.ts is the central registry that pulls them all together into a single ordered list. But it is far from a simple import-and-export. This file contains some of the most interesting engineering decisions in the entire codebase.

Dead-Code Elimination with Feature Flags

Not every tool ships in every build. Anthropic uses Bun's compile-time feature() macro to strip entire tools from the final binary. If a feature flag is off, the tool's code literally does not exist in the compiled output. Zero bytes. Zero overhead.

// tools.ts - conditional tool loading (from the actual source)
import { feature } from 'bun:bundle'

// These tools only exist when their feature flag is enabled at BUILD time:
const SleepTool = feature('PROACTIVE') || feature('KAIROS')
    ? require('./tools/SleepTool/SleepTool.js').SleepTool
    : null

const cronTools = feature('AGENT_TRIGGERS')
  ? [
      require('./tools/ScheduleCronTool/CronCreateTool.js').CronCreateTool,
      require('./tools/ScheduleCronTool/CronDeleteTool.js').CronDeleteTool,
      require('./tools/ScheduleCronTool/CronListTool.js').CronListTool,
    ]
  : []

// Internal-only tools gated by environment variable:
const REPLTool = process.env.USER_TYPE === 'ant'
    ? require('./tools/REPLTool/REPLTool.js').REPLTool
    : null

Takeaway for your own projects: If you maintain a product with internal-only features or experimental capabilities, compile-time dead-code elimination is far more secure than runtime checks. A runtime if statement can be patched out by a determined user. Code that was never compiled into the binary simply cannot be accessed.

Why Tool Order Matters

The order of tools in the getAllBaseTools() array is not random. It directly affects the system prompt sent to the Claude API. Tools listed first appear more prominently in the prompt, which means Claude is statistically more likely to reach for them. That is why AgentTool, BashTool, FileReadTool, and FileEditTool are always at the top of the list.

// tools.ts - the master tool list (simplified from source)
export function getAllBaseTools(): Tools {
  return [
    AgentTool,          // First: orchestrate subtasks
    TaskOutputTool,     // Monitor background work
    BashTool,           // Execute shell commands
    GlobTool, GrepTool, // Search and discover
    FileReadTool,       // Read files
    FileEditTool,       // Surgical edits
    FileWriteTool,      // Create new files
    // ... 30+ more tools, ordered by importance
    BriefTool,          // Last: session handoff (rarely needed)
  ]
}

Token Budgeting: Deferred Tools

Claude Code ships with over 46 built-in tools. Sending all 46 API schemas in the system prompt for every single conversation turn would consume thousands of tokens and drastically increase API costs and latency. To solve this, Anthropic implemented Deferred Tools.

By defining readonly shouldDefer: true, a tool is temporarily hidden from Claude's initial prompt. The model only knows about the core alwaysLoad tools directly (like Bash, File Edit, File Read). If Claude encounters a task needing a niche tool (like creating a Scheduled Cron job), it first calls the ToolSearchTool with a keyword query. The system returns the hidden schema, and then Claude can use it.

Each deferred tool includes a searchHint property (a short phrase like "create recurring cron scheduled agent task") that the ToolSearch algorithm matches against. Think of it as the tool advertising its capabilities to an internal search engine.

Design lesson: If you are building LLM applications with more than 10-15 tools, implement a tool-search discovery mechanism like this. It keeps your system prompt slim, reduces token costs, and paradoxically improves tool selection accuracy because the model is not overwhelmed by irrelevant options.

Managing Memory: maxResultSizeChars

What happens if Claude asks to run ls -laR / and the bash tool outputs 50 megabytes of text? Feeding that back into the LLM context window would instantly blow up the token limit and crash the session.

Anthropic enforces a strict maximum cap via the maxResultSizeChars property on every tool. If a tool's output string exceeds this limit, the system gracefully traps the output, saves it to a temporary file on the user's disk, and instead returns a compressed summary to Claude: "Output was too large. I saved the full result to /tmp/claude-xyz.txt. You can read specific sections with FileReadTool."

The one exception is FileReadTool itself, which sets maxResultSizeChars: Infinity. Think about why: if a file read produces output that gets persisted to another file, Claude would need to use FileReadTool to read that file, which could again be too large, creating an infinite loop. So FileReadTool manages its own output size internally with line-range parameters.

The Full Tool Catalog

Here is a breakdown of every tool category in the codebase, with explanations of why each tool exists and how it fits into the larger architecture.

Filesystem Tools

The bread and butter for any coding agent. These let Claude navigate and manipulate your repository.

ToolWhat it does and why it matters
FileReadToolReads file contents with optional line ranges. Handles images, PDFs, and binary files automatically. Sets maxResultSizeChars: Infinity to avoid the circular persist-to-file problem described above.
FileEditToolThe surgical knife. Instead of replacing an entire file (which is slow and token-expensive), it uses precise string-match replacement to patch specific sections. Shows a diff preview in the terminal before applying.
FileWriteToolCreates new files or completely overwrites existing ones. The permission system enforces that Claude must read a file before overwriting it, preventing accidental data loss.
GlobToolFast file-path matching. Returns results sorted by modification time, feeding the model the most recently touched (and therefore most relevant) files first.
GrepToolRegex search across file contents using ripgrep under the hood. The backbone of code discovery: Claude uses this to find function definitions and variable usages before editing.
LSPToolLanguage Server Protocol integration: hover info, go-to-definition, find-references. Gives Claude IDE-level understanding of your code's type system.

Execution Tools

When the model writes code, it needs to test it. This is what sets Claude Code apart from basic web-based chatbots.

ToolWhat it does and why it matters
BashToolSpawns a persistent shell session (PTY), so environment variables and working directories persist across calls. Features strict execution timeouts to prevent infinite loops, and integrates with the security classifier to flag dangerous commands before they run.
PowerShellToolThe Windows equivalent. Mirrors BashTool's API but targets PowerShell. Only loaded when running on Windows (dead-code eliminated on other platforms).
NotebookEditToolEdits Jupyter Notebook cells without corrupting the underlying JSON structure. Supports adding, modifying, and deleting cells while preserving execution outputs.

Agent & Task Tools

This is where Claude Code gets truly powerful: the ability to spawn copies of itself for parallel work.

ToolWhat it does and why it matters
AgentToolSpawns a child agent with its own isolated tool context and message history. The parent agent can run multiple subagents concurrently, breaking massive refactors into digestible, parallel chunks.
TaskCreateToolCreates background tasks (long-running subagents) that execute independently. Returns a Task ID so the main thread can poll progress asynchronously, just like async programming patterns.
TaskGetToolRetrieves the status and output of a background task by its ID. Read-only.
TaskStopToolTerminates a running background task. Think of it as kill for agent processes.
SendMessageToolSends a message to another agent in a swarm/team configuration. Enables multi-agent collaboration on complex problems.
TeamCreateToolCreates a team of coordinated agents. Only available when the AGENT_SWARMS feature is enabled.

Web & Network Tools

ToolWhat it does and why it matters
WebFetchToolFetches web pages and converts HTML to clean markdown for Claude to consume. Handles redirects, authentication, and even PDF rendering.
WebSearchToolPerforms web searches and returns ranked results with snippets. Backed by search APIs. Read-only.

MCP & Integration Tools

Anthropic knew they could not build every tool imaginable. Instead of a messy plugin system, they implemented the Model Context Protocol (MCP) as the universal extension mechanism.

ToolWhat it does and why it matters
MCPToolA dynamic proxy. At runtime, it constructs a full Tool interface for any external server that speaks MCP. Claude Code can instantly "learn" to use your Postgres database or Stripe API without Anthropic changing a single line of core code.
McpAuthToolHandles mid-session authentication flows (OAuth redirects, token exchange) when an MCP server requires credentials.
ListMcpResourcesToolEnumerates available resources from connected MCP servers so Claude knows what data sources are accessible.
ReadMcpResourceToolReads a specific resource from an MCP server by URI.

Utility & Meta Tools

ToolWhat it does and why it matters
ToolSearchToolThe discovery engine for deferred tools. Claude searches by keyword, and the system returns matching tool schemas on demand.
SkillToolExecutes a "skill" (a bundled script or slash-command workflow) programmatically from within a tool call.
TodoWriteToolManages a live todo list in the REPL sidebar. Interestingly, it renders nothing in the chat transcript because its output goes to the sidebar panel instead.
AskUserQuestionToolPauses execution to ask the user a clarifying question and wait for a response. Only works in interactive mode.
BriefToolCreates a compressed summary of the current session for handoff to another agent. Think of it as "context serialization" for agent-to-agent communication.
EnterPlanModeToolSwitches Claude into "plan mode" where every tool call requires explicit user approval. The model can enter this mode voluntarily when it detects a high-risk situation.
ConfigToolReads and writes Claude Code configuration (user settings, project settings). Only available in internal builds.

Source Deep Dive: files.ts — Binary File Detection

Before any filesystem tool reads a file, Claude Code needs to know whether it's text or binary. Feeding raw binary data to an LLM as if it were source code wastes tokens and produces garbled output. The logic lives in src/utils/files.ts, which exports a two-stage detection system: a fast extension blocklist and a content-heuristic fallback.

Stage 1: The Extension Blocklist

BINARY_EXTENSIONS is a JavaScript Set containing roughly 80 file extensions across every category of binary content. It's organized by type, giving a clear picture of what Claude Code considers "untouchable as text":

CategoryExtensions included
Images.png, .jpg, .jpeg, .gif, .bmp, .ico, .webp, .tiff, .tif
Video.mp4, .mov, .avi, .mkv, .webm, .wmv, .flv, .m4v, .mpeg, .mpg
Audio.mp3, .wav, .ogg, .flac, .aac, .m4a, .wma, .aiff, .opus
Archives.zip, .tar, .gz, .bz2, .7z, .rar, .xz, .z, .tgz, .iso
Executables.exe, .dll, .so, .dylib, .bin, .o, .a, .obj, .lib, .app, .msi, .deb, .rpm
Documents.pdf, .doc, .docx, .xls, .xlsx, .ppt, .pptx, .odt, .ods, .odp
Fonts.ttf, .otf, .woff, .woff2, .eot
Bytecode / VM.pyc, .pyo, .class, .jar, .war, .ear, .node, .wasm, .rlib
Databases.sqlite, .sqlite3, .db, .mdb, .idx
Design / 3D.psd, .ai, .eps, .sketch, .fig, .xd, .blend, .3ds, .max
Flash.swf, .fla
Lock / profiling.lockb, .dat, .data

The lookup function is a one-liner:

export function hasBinaryExtension(filePath: string): boolean {
  const ext = filePath.slice(filePath.lastIndexOf('.')).toLowerCase()
  return BINARY_EXTENSIONS.has(ext)
}

This runs in O(1) time because it's a Set lookup, making the extension check essentially free on every file open. Notice that .pdf is included in the binary list, which would seem to prevent Claude from reading PDFs. The source comment explains the exception: "PDF is here; FileReadTool excludes it at the call site." FileReadTool has special-cased PDF to invoke a PDF extraction pipeline rather than raw file reading, so the extension check is bypassed for PDFs specifically.

Stage 2: The Content Heuristic

Extension matching isn't enough — files can be misnamed, and new binary formats appear regularly. The second check, isBinaryContent(), inspects the actual file bytes:

const BINARY_CHECK_SIZE = 8192  // Only check first 8KB

export function isBinaryContent(buffer: Buffer): boolean {
  const checkSize = Math.min(buffer.length, BINARY_CHECK_SIZE)

  let nonPrintable = 0
  for (let i = 0; i < checkSize; i++) {
    const byte = buffer[i]!

    // Null byte is a strong indicator of binary
    if (byte === 0) {
      return true  // Early exit: single null byte = binary
    }

    // Count non-printable, non-whitespace bytes
    // Printable ASCII is 32-126, plus tab(9), newline(10), CR(13)
    if (byte < 32 && byte !== 9 && byte !== 10 && byte !== 13) {
      nonPrintable++
    }
  }

  // If more than 10% non-printable, likely binary
  return nonPrintable / checkSize > 0.1
}

The heuristic has two tiers. The first is an immediate early exit: a single null byte (\0) anywhere in the first 8 KB is treated as a definitive binary indicator. Nearly all binary formats (ELF executables, PE headers, compiled JARs, ZIP archives) have null bytes in their headers, while valid UTF-8 text essentially never does. The second tier counts all bytes below ASCII 32 that aren't standard whitespace (tab, newline, carriage return) — if more than 10% of the sampled bytes are non-printable, the file is classified as binary.

Why only 8 KB? Reading the full file just to determine if it's readable would be expensive for large files. The 8 KB sample is a well-established heuristic: if a file is text, its first 8 KB will reflect that reliably. Tools like git, file, and most IDEs use a similar approach. The cost is a small false-negative rate for pathological cases (e.g., a text file with binary headers), which is acceptable given how rarely it occurs.

Where Binary Detection Runs

These two functions are called in sequence in FileReadTool before any file content is returned to the model. If hasBinaryExtension() returns true (and the file isn't PDF), the tool returns an informative error message like "This file has a binary extension and cannot be read as text." If the extension passes but isBinaryContent() returns true on the first 8 KB, the same error is returned. The model never sees the raw bytes, preventing both garbage output and context window contamination from binary data.

The Safety Protocol

Of course, arming an AI with BashTool without safeguards would be reckless. Every single tool.call() invocation passes through a multi-layered permissions interceptor before anything actually executes.

validateInput()
checkPermissions()
Permission Rules
User Dialog
tool.call()

First, validateInput() checks that the arguments are structurally valid. Then checkPermissions() runs tool-specific security logic. The result is one of three outcomes:

  • { behavior: 'allow' } - Proceed immediately. Used for read-only tools like GrepTool.
  • { behavior: 'ask' } - Freeze execution and show a permission dialog in the terminal. The model waits until you explicitly approve.
  • { behavior: 'deny' } - Reject the tool call entirely and tell Claude why it was denied.

Tools that declare isDestructive: () => true trigger an even more prominent confirmation dialog. This is how Claude Code ensures you always stay in control, even when the agent is running autonomously.

For a complete deep dive into the permission modes (Interactive, Auto, Bypass) and the security classifier, see the Security & Permissions page.

Frequently Asked Questions

How do I add my own tool?
You can build a class implementing the abstract Tool interface, define a zod schema for arguments, and register it in the central tool registry.
Are bash commands executed natively or safely?
BashTool executes commands natively, which is why the permission system acts as a crucial checkpoint before potentially dangerous executions.
How does the WebFetchTool work?
It scrapes specific URLs, extracts the core content usually converting to markdown, and allows Claude to natively read and interpret remote documentation.
Why are there so many tools built-in?
Having over 40 specialised tools like FileEdit, Bash, WebFetch, and MCP prevents the model from relying on slow, error-prone generic bash scripts to perform complex multi-step actions.
Can tools yield partial results?
Yes, tools can yield ProgressMessages which render real-time UI feedback to the terminal before the tool action fully resolves.