Building Custom Agents and Sub-Agents

Design multi-agent architectures where a parent agent delegates to specialised sub-agents. Build a practical code review system.

What you will build

A working multi-agent code review system that checks security, performance, and style in parallel

Prerequisites

Getting Started with Claude Code Writing Effective CLAUDE.md Files Hooks, Skills, and Custom Commands

In this guide

The agent architecture
When to use sub-agents vs single agent
Configuring agent roles with CLAUDE.md
The plan-execute-verify loop
Practical example: multi-agent code review system
Error handling and recovery in agent workflows

The agent architecture

In Claude Code, the agent architecture follows a parent-child delegation pattern. The parent agent is your main Claude Code session — it receives your high-level instruction, breaks it into sub-tasks, and delegates each sub-task to a specialised sub-agent. Each sub-agent gets its own context window, its own set of tools, and its own CLAUDE.md instructions. The parent agent orchestrates: it decides what to delegate, monitors results, and synthesises the final output. Think of it like a senior developer managing a team. You tell the senior dev: "Refactor the authentication system." They break this into tasks — update the login flow, migrate the session store, update the tests, update the documentation — and assign each to the right specialist. The specialists work independently and report back. The senior dev reviews their work and delivers the integrated result. The key insight: sub-agents do not share context with each other. Each one sees only what the parent explicitly passes to it. This is a feature, not a limitation — it keeps each agent focused and prevents context pollution. The parent decides what context each child needs, which is exactly how effective delegation works in human teams too.

When to use sub-agents vs single agent

Not every task needs sub-agents. In fact, most tasks are better handled by a single Claude Code session. Use sub-agents when you have genuinely independent sub-tasks that benefit from separate contexts. Good use cases for sub-agents: large refactoring across multiple unrelated modules, parallel code review where each reviewer checks a different concern (security, performance, style), generating tests for multiple files simultaneously, and research tasks where each sub-agent investigates a different approach. Bad use cases for sub-agents: tasks where the sub-tasks are sequential and each depends on the previous one's output. Tasks where the total work is small enough to fit in one context window. Tasks where the coordination overhead exceeds the parallelism benefit. The decision framework is simple. Ask: can these sub-tasks run independently? Do they benefit from separate contexts? Is the total work too large for one context window? If you answer yes to at least two of these, sub-agents are worth considering. If not, keep it simple with a single agent. Premature decomposition into sub-agents adds complexity without benefit. Start with one agent. Split only when you hit real limits.

Configuring agent roles with CLAUDE.md

Each sub-agent can have its own CLAUDE.md-style instructions that define its role and constraints. When the parent agent spawns a sub-agent, it passes a system prompt that acts as that agent's mission brief. For a code review system, you might define three agent roles. The Security Reviewer's prompt: "You are a security-focused code reviewer. Check for: SQL injection, XSS, authentication bypass, secret exposure, insecure dependencies. Flag severity (critical/high/medium/low). Ignore style issues — another reviewer handles those." The Performance Reviewer's prompt: "You are a performance-focused code reviewer. Check for: N+1 queries, unnecessary re-renders, missing memoisation, large bundle imports, unoptimised images. Suggest specific fixes." The Style Reviewer's prompt: "You are a style and maintainability reviewer. Check for: naming consistency, function length, code duplication, missing types, unclear logic. Reference the project's CLAUDE.md coding standards." The parent agent's CLAUDE.md then says: "When asked to review code, spawn three sub-agents with these roles. Collect their results. Merge into a single review organised by severity. Remove duplicates. Present a unified review." This separation of concerns produces dramatically better reviews than a single agent trying to check everything at once.

The plan-execute-verify loop

Effective agent systems follow a three-phase loop: plan, execute, verify. In the planning phase, the parent agent analyses the task, identifies sub-tasks, determines dependencies between them, and decides the execution order. For a code review, planning means: identify which files changed, determine the review scope, and assign reviewers. In the execution phase, sub-agents perform their assigned work. They read files, analyse code, run tests, or generate output — whatever their role requires. The parent agent monitors progress and handles any failures or unexpected results. A sub-agent that encounters an error reports back to the parent, which decides whether to retry, reassign, or skip. In the verification phase, the parent agent reviews sub-agent outputs for quality and consistency. It checks for contradictions (one reviewer says "add caching" while another says "remove unnecessary caching"), merges overlapping findings, and produces a coherent final result. The loop can repeat: if verification reveals gaps, the parent agent plans additional work and sends sub-agents back for another pass. This iterative approach is more reliable than a single-pass system. The plan-execute-verify loop mirrors how human teams work effectively — and it is the pattern behind every reliable agent system.

Practical example: multi-agent code review system

Let us build a complete multi-agent code review system. The parent agent receives a command: /review-pr 42. It fetches the PR diff from GitHub (via MCP), identifies the changed files, and categorises them by type (API routes, UI components, tests, configuration). It then spawns sub-agents. The Security Agent reviews API routes and authentication logic, checking for injection vulnerabilities, missing auth checks, and data exposure. The Performance Agent reviews database queries and UI components, checking for N+1 queries, missing indexes, unnecessary re-renders, and bundle size impact. The Test Coverage Agent checks whether the changes have adequate tests, identifies untested edge cases, and suggests specific test scenarios. Each sub-agent works independently, examining only the files relevant to their domain. They produce structured output: a list of findings, each with a file path, line number, severity, description, and suggested fix. The parent agent collects all findings, removes duplicates, resolves contradictions, and organises the final review by severity. Critical issues first, then high, medium, and low. The output is formatted as a PR comment that the parent posts back to GitHub via MCP. Total time: two to three minutes for a thorough, multi-perspective review that would take a human reviewer 30 minutes.

Error handling and recovery in agent workflows

Agent systems fail. Sub-agents hit context limits, MCP servers time out, external APIs return errors, and code changes break assumptions. Robust error handling is what separates toy demos from production systems. First, every sub-agent should have a timeout. If a security review takes more than two minutes, something is wrong — the parent agent should terminate it and either retry or proceed without that review. Second, implement graceful degradation. If the database MCP server is down, the performance review can still check for code-level issues even if it cannot verify query plans. The parent agent should produce the best possible result with available resources, not fail entirely because one component is unavailable. Third, use structured error reporting. When a sub-agent fails, it should report: what it was trying to do, what went wrong, what it accomplished before failing, and what remains undone. This lets the parent agent make informed retry decisions. Fourth, implement retry with backoff for transient failures. API rate limits, network blips, and temporary outages resolve themselves — a simple retry often succeeds. But limit retries to avoid infinite loops. Fifth, log everything. Every agent action, every sub-agent spawn, every error, every retry. When an agent workflow produces unexpected results, logs are the only way to diagnose what happened. Treat agent logs like application logs — they are essential infrastructure, not optional extras.

Related Lesson