Automated Testing Pipelines

Build a comprehensive testing pipeline with unit tests, integration tests, end-to-end tests, and CI integration that catches bugs before they reach production.

What you will build

A multi-layer testing pipeline with CI integration, coverage reporting, and automated quality gates

Prerequisites

Getting Started with Claude Code Testing Your Code with AI Assistance

In this guide

The testing pyramid and pipeline architecture
Unit testing with maximum coverage
Integration testing with real services
End-to-end testing with Playwright
CI pipeline with GitHub Actions
Test monitoring, maintenance, and culture

The testing pyramid and pipeline architecture

A testing pipeline runs automated checks at multiple levels to catch different types of bugs. The testing pyramid defines the layers: many fast unit tests at the base, fewer integration tests in the middle, and a small number of slow end-to-end tests at the top. Each layer catches bugs that the others miss. Unit tests catch logic errors in individual functions. Integration tests catch communication errors between components. End-to-end tests catch user-facing bugs in the complete workflow. Ask Claude Code: Create a Next.js project with TypeScript for demonstrating a full testing pipeline. Set up the project with a realistic structure: API routes at src/app/api/, service layer at src/services/, utility functions at src/lib/, and React components at src/components/. Create sample code at each layer: a utility function (formatCurrency), a service (UserService with CRUD methods that call a database), an API route (GET/POST /api/users), and a component (UserList that fetches and displays users). Install the testing tools: npm install --save-dev jest ts-jest @types/jest @testing-library/react @testing-library/jest-dom playwright. Configure Jest for unit and integration tests (fast, run in Node.js). Configure Playwright for end-to-end tests (slower, run in a real browser). Ask Claude Code: Create the Jest and Playwright configurations. Jest should run tests matching *.test.ts and *.test.tsx. Playwright should run tests matching *.e2e.ts. Set up separate npm scripts: test:unit, test:integration, test:e2e, and test:all. Each level has different speed and reliability characteristics. Unit tests run in milliseconds and almost never flake. Integration tests run in seconds and occasionally flake due to timing issues. E2E tests run in seconds to minutes and flake most often due to browser rendering and network variability. The pipeline should reflect this: run units first (fast feedback), then integration (medium feedback), then E2E (slow but thorough).

Unit testing with maximum coverage

Unit tests are the foundation. They test individual functions in isolation with mocked dependencies. A well-tested codebase has hundreds of unit tests that run in under 10 seconds. Ask Claude Code: Write comprehensive unit tests for the utility functions. For formatCurrency: test with positive numbers, negative numbers, zero, very large numbers, decimals that need rounding, different currency codes (GBP, USD, EUR), and invalid inputs (null, undefined, NaN). For each test, use descriptive names: formats positive GBP amount with two decimals rather than test case 1. Ask Claude Code: Write unit tests for the UserService. Mock the database layer — the service should be tested against a mock, not a real database. Test each method: createUser validates input and calls the database insert, getUserById returns the user when found and null when not found, updateUser validates input and updates only the provided fields, deleteUser soft-deletes by setting a flag rather than removing the row, and listUsers supports pagination and filtering. Mock configuration. Ask Claude Code: Create a shared mock factory at src/test/mocks/database.ts. The factory creates a mock database client with typed mock functions for query, insert, update, and delete. Each test can configure the mock to return specific data or throw specific errors. Use jest.spyOn for methods that need partial mocking (the real implementation runs but you can inspect calls and override return values). Add snapshot tests for components. Ask Claude Code: Write snapshot tests for the UI components. Render the UserList component with mock data and compare the output to a stored snapshot. If the output changes, the test fails — this catches unintended UI changes. Use toMatchInlineSnapshot for small outputs (the snapshot is stored in the test file) and toMatchSnapshot for larger outputs (stored in a __snapshots__ directory). Run the unit tests: npm run test:unit. Target 80 percent code coverage on the utility and service layers. Ask Claude Code: Generate a coverage report and identify uncovered code paths. Write additional tests to cover the gaps. Common error: testing implementation details instead of behaviour. Do not test that a function calls another function internally — test that the output is correct for given inputs. Internal refactoring should not break tests.

Integration testing with real services

Integration tests verify that your components work together correctly. They use real (test) databases, real HTTP requests, and real middleware — but skip the browser. Ask Claude Code: Set up integration tests for the API routes. Use supertest to make HTTP requests to your Next.js API routes. Create a test setup that starts the Next.js server, connects to a test database (separate from your development database), seeds the test database with known data, and tears down after the test suite completes. Write API integration tests. Ask Claude Code: Test the complete request-response cycle for each API endpoint. POST /api/users: send a valid user payload, verify 201 response with the created user. Send an invalid payload (missing email), verify 400 response with validation errors. Send a duplicate email, verify 409 conflict response. GET /api/users: verify the list returns seeded users, test pagination parameters (page, limit), test filtering parameters (role, status), and verify the response shape matches the TypeScript type. Each test should be independent — one test's side effects should not affect another test. Ask Claude Code: Implement test isolation. Each test runs inside a database transaction that rolls back after the test completes. This means tests can create and modify data without affecting other tests, and the database returns to its seeded state after each test. This is faster than re-seeding the database for every test. Add service integration tests. Ask Claude Code: Test the UserService against the real test database (not mocks). These tests verify that the SQL queries actually work, that transactions commit and roll back correctly, and that constraints (unique email, foreign keys) are enforced. These catch bugs that unit tests with mocked databases miss — like a SQL query that is syntactically valid but semantically wrong. Run integration tests: npm run test:integration. These should complete in under 30 seconds. If they take longer, identify the slow tests. Ask Claude Code: Profile my integration test suite. Which tests are slowest? Can any be optimised by reducing database operations or running in parallel? Common error: integration tests that depend on external services (APIs, cloud storage) are fragile. Mock external services at the HTTP level using libraries like nock or msw. Only use real external services in dedicated contract tests.

End-to-end testing with Playwright

End-to-end tests verify complete user workflows in a real browser. They are the closest thing to a real user interacting with your application. Ask Claude Code: Configure Playwright for the project. Create playwright.config.ts with: the base URL pointing to your local development server, browser configuration (test in Chromium, Firefox, and WebKit for cross-browser coverage), screenshot on failure (helps diagnose flaky tests), and video recording for failed tests. Write E2E tests for the critical user flows. Ask Claude Code: Create a test file for the user registration flow. The test navigates to the registration page, fills in the name and email fields, submits the form, verifies the success message appears, and verifies the user can access protected content. Use Playwright's locator API for reliable element selection: page.getByRole('button', { name: 'Sign Up' }) is more resilient than page.click('.btn-primary') because it does not break when CSS classes change. Write a test for the full CRUD workflow. Ask Claude Code: Test creating a user through the UI, viewing the user in the list, editing the user's details, verifying the changes appear, deleting the user, and verifying the user is no longer in the list. Each step should assert visible changes before proceeding — do not just click buttons blindly. Add visual regression testing. Ask Claude Code: Take screenshots of key pages (homepage, user list, user detail) and compare them to baseline images. If the visual appearance changes by more than a threshold (0.1 percent pixel difference), the test fails. This catches CSS regressions that functional tests miss — a button that works correctly but is invisible because its colour matches the background. Run E2E tests: npx playwright test. View the results with: npx playwright show-report. Failed tests include screenshots and videos showing exactly what went wrong. Common error: flaky E2E tests are the biggest problem. A test that passes 95 percent of the time fails 5 percent of the time — with 20 tests, you get a false failure on 64 percent of runs. Mitigate with proper waiting (await page.waitForSelector, not arbitrary timeouts), retry on failure (Playwright supports retries configuration), and isolating test data (each test creates its own data, never depending on other tests' data).

CI pipeline with GitHub Actions

The testing pipeline should run automatically on every push and pull request. GitHub Actions provides the infrastructure. Ask Claude Code: Create a GitHub Actions workflow at .github/workflows/test.yml. The workflow triggers on push to main and on pull requests. Define three jobs that run sequentially: unit-tests, integration-tests, and e2e-tests. The unit-tests job: checks out the code, installs Node.js and dependencies (with caching for node_modules), runs npm run test:unit with coverage reporting, and uploads the coverage report as an artifact. The integration-tests job: runs after unit-tests passes, starts a PostgreSQL service container, runs database migrations, runs npm run test:integration, and uploads results. The e2e-tests job: runs after integration-tests passes, installs Playwright browsers, starts the application server, runs npx playwright test, and uploads the test report, screenshots, and videos as artifacts. Add quality gates. Ask Claude Code: Configure the workflow to fail if: code coverage drops below 80 percent (compare to the main branch coverage), any test fails (obvious but must be explicit), the test suite takes longer than 10 minutes (prevents slow test accumulation), or new code has no corresponding tests (check that PR files have matching test files). Add a PR comment bot. Ask Claude Code: At the end of the pipeline, post a comment on the pull request with: the test results summary (X passed, Y failed, Z skipped), code coverage change (coverage went from 82 percent to 84 percent — nice improvement), performance change (test suite is 3 seconds slower than baseline — investigate), and links to the full test report and coverage report. Use the GitHub API to post the comment. This feedback loop is valuable — developers see the testing results without leaving the PR page. Common error: CI caching saves significant time. Cache node_modules between runs (keyed on package-lock.json hash) and Playwright browsers (keyed on the Playwright version). Without caching, every run downloads hundreds of megabytes of dependencies.

Test monitoring, maintenance, and culture

A testing pipeline is only valuable if the team maintains it. Flaky tests, slow suites, and ignored failures erode trust in the pipeline until people start skipping it. Ask Claude Code: Create a test health dashboard at src/test/dashboard.ts. Track metrics across test runs: total test count (should grow over time), pass rate (should be above 99 percent), flake rate (tests that sometimes pass and sometimes fail — should be below 1 percent), average suite duration (should not grow faster than the codebase), and coverage trend (should be stable or increasing). Identify and fix flaky tests. Ask Claude Code: Create a flake detection script that runs the full test suite 5 times and identifies tests that produce different results across runs. For each flaky test, suggest the likely cause: timing-dependent assertions (fix with proper waiting), shared state between tests (fix with test isolation), external service dependency (fix with better mocking), or race conditions in async code (fix with proper synchronisation). Create a test maintenance routine. Ask Claude Code: Build a script that analyses the test suite and reports: dead tests (tests for code that no longer exists), slow tests (individual tests taking more than 5 seconds), duplicate tests (tests that cover identical code paths), and orphaned mocks (mock configurations that no test uses). Run this monthly and clean up the results. Add a pre-commit hook for fast feedback. Ask Claude Code: Create a Husky pre-commit hook that runs the unit tests for changed files only. If I change src/lib/formatCurrency.ts, only run src/lib/formatCurrency.test.ts. This gives instant feedback without running the entire suite. Use jest --findRelatedTests to discover which test files cover the changed source files. The hook should complete in under 10 seconds — if it takes longer, developers will disable it. Build a testing culture guide. Ask Claude Code: Create a TESTING.md document with the team's testing standards: when to write each type of test (unit for logic, integration for APIs, E2E for critical paths), naming conventions for test files and test cases, mock organisation and sharing, how to handle flaky tests (fix immediately or quarantine, never ignore), and how to measure testing effectiveness (not just coverage — also bug escape rate and time to detect regressions). The complete pipeline catches bugs at every level, runs automatically on every change, and provides fast feedback to developers. This is the foundation of shipping confidently.

Related Lesson

Quality Engineering

This guide is hands-on and practical. The full curriculum covers the conceptual foundations in depth with structured lessons and quizzes.

Go to lesson

Try the CLAUDE.md Generator Explore the full AI course

← Previous in track

AI-Powered Data Migrations

Next in track →

Building Chrome Extensions with AI

🟣 All guides in Power User Workflows

1.Hooks, Skills, and Custom Commands 2.MCP Servers: Connect Claude Code to Everything 3.Building Custom Agents and Sub-Agents 4.Claude Code for Teams: Worktrees, CI, and Shared Config 5.Headless Agents: Claude Code on Autopilot 6.Advanced Git Workflows with Claude Code 7.Automated Code Review 8.Database Management with AI 9.Performance Optimisation with AI 10.Security Hardening 11.Building Custom Slash Commands 12.Building MCP Servers from Scratch 13.AI-Powered Data Migrations 14.Automated Testing Pipelines←15.Building Chrome Extensions with AI 16.Serverless Functions for AI Workflows 17.Multi-Repository Management with AI

🟣 Power User Workflows — Guide 14 of 17

View track