Build an AI-Powered Customer Support Bot

Build a support chatbot that answers questions from your knowledge base, handles conversation context, and escalates to humans when needed.

What you will build

A deployed AI support bot with knowledge base, conversation history, and human handoff

Prerequisites

Working with APIs: Connecting Your App to the World

In this guide

Architecture of an AI support bot
Building and indexing the knowledge base
Implementing retrieval-augmented generation
Building the chat interface
Conversation context and memory
Human escalation and handoff
Testing and quality assurance
Deployment and production monitoring

Architecture of an AI support bot

An AI customer support bot has five components: a knowledge base containing your documentation, FAQs, and policies; a retrieval system that finds relevant information from the knowledge base; a language model that generates natural responses using the retrieved information; a conversation manager that tracks context across multiple messages; and an escalation system that routes to a human when the AI cannot help. This architecture is called RAG, or Retrieval Augmented Generation. Instead of relying solely on the language model's training data which may be outdated or wrong about your specific product, you feed it your actual documentation and let it generate answers grounded in that source material. The result is an AI that gives accurate, up-to-date answers specific to your product. Create the project: `mkdir ~/projects/support-bot && cd ~/projects/support-bot && git init && claude`. Tell Claude Code: Create a Next.js support chatbot with TypeScript and Tailwind. The architecture should include a chat UI on the frontend, API routes that handle messages, a knowledge base stored as markdown files in a content/ directory, a vector search system for finding relevant knowledge base entries, and integration with the Anthropic API for generating responses. Start with the project structure and basic chat UI. Claude Code will scaffold the application with proper separation of concerns. We will build each component step by step in the following sections.

Building and indexing the knowledge base

The knowledge base is the foundation of your bot's accuracy. Create a content/ directory with markdown files organized by topic. Ask Claude Code: Create a knowledge base structure with files for getting started, billing, troubleshooting, API reference, and account management. Write realistic content for each file, imagining we are building a support bot for a project management SaaS tool. Each file should be 500 to 1000 words with clear headings, step-by-step instructions, and common questions answered. Claude Code will generate comprehensive documentation. Review and edit it — the quality of your knowledge base directly determines the quality of your bot's answers. Next, build the indexing system. Ask Claude Code: Create a script that reads all markdown files in content/, splits them into chunks of roughly 500 tokens each, generates vector embeddings for each chunk using an embedding model, and stores the chunks and their embeddings in a SQLite database using better-sqlite3. The script should be re-runnable so it updates existing chunks and adds new ones without duplicating. Run the indexing script: `npm run index-knowledge-base`. You should see output showing the number of files processed, chunks created, and embeddings generated. This script only needs to run when you update the knowledge base content. For production you would use a dedicated vector database like Pinecone or Weaviate, but SQLite with cosine similarity search works well for small to medium knowledge bases containing hundreds of documents.

Implementing retrieval-augmented generation

Now connect the retrieval system to the language model. Ask Claude Code: Create an API route at /api/chat that receives a user message, generates an embedding for the message, searches the knowledge base for the 5 most similar chunks using cosine similarity, constructs a system prompt that includes the retrieved chunks as context, sends the conversation history plus the new message to the Anthropic API, and returns the generated response. The system prompt should instruct the model to only answer based on the provided context, admit when it does not know something, suggest contacting support for complex issues, and maintain a friendly professional tone. The key to good RAG is the system prompt. It should clearly state: You are a helpful support assistant for our product. Answer questions based ONLY on the following knowledge base articles. If the knowledge base does not contain the answer, say so clearly and suggest the user contact human support. Do not make up information. Be concise and friendly. Then include the retrieved knowledge base chunks as context. Test with questions that are clearly in the knowledge base — asking how to reset a password should get an accurate step-by-step answer. Test with questions not in the knowledge base — asking about something unrelated should get an honest response saying the information is not available. Test with ambiguous questions that partially match — saying you cannot log in should retrieve troubleshooting content and ask clarifying questions about the specific error.

Building the chat interface

The chat interface needs to feel responsive and professional. Ask Claude Code: Build a chat UI component with a message list showing user and bot messages with different styling, a text input with a send button and Enter key support, a typing indicator while the bot generates a response, automatic scrolling to the latest message, markdown rendering in bot messages for code blocks and lists and links, and a header with the bot name and an online status indicator. Use streaming for bot responses so users see the text appear progressively instead of waiting for the complete response. Streaming is critical for user experience. Without it, users stare at a typing indicator for 3 to 5 seconds before seeing anything. With streaming, they see the response forming in real time which feels much more interactive. Ask Claude Code to implement streaming using the Anthropic API's stream mode. The API route should use Server-Sent Events to stream tokens to the frontend. The frontend should append each token to the current message as it arrives. For message styling: user messages should be right-aligned with a solid background color and bot messages should be left-aligned with a lighter background and an avatar icon. Add timestamps that appear on hover. Add a copy button on bot messages so users can easily copy the response text. For mobile: make the chat full-screen with a floating input bar fixed at the bottom, similar to messaging apps like iMessage or WhatsApp. Test the complete flow from typing a question through seeing the streaming response.

Conversation context and memory

A support bot that forgets the previous message is frustrating to use. Ask Claude Code: Add conversation management. Each chat session should have a unique ID stored in the browser. The full conversation history including user messages and bot responses should be sent with each API request so the bot has context for follow-up questions. Store conversations in the database with session_id, messages as a JSON array, created_at, updated_at, and a resolved boolean flag. Add a /api/conversations endpoint to list and retrieve past conversations. Conversation context enables multi-turn interactions. A user can ask how to upgrade their plan, get instructions, and then follow up asking what happens if they are on a team plan. Without context, the bot would not know the second question relates to plan upgrades. Implement context window management: the Anthropic API has a token limit. Ask Claude Code: When the conversation exceeds 50 messages, summarize the older messages into a condensed format and include the summary as context instead of the full history. This prevents hitting the API context limit while preserving important information from earlier in the conversation. Also add suggested follow-up questions: after each bot response, show 2 to 3 clickable question chips below the message. Generate these suggestions based on the current conversation topic. For billing topics suggest questions about pricing, upgrades, and invoices. These suggestions guide users toward productive conversations and reduce the chance of off-topic questions the bot struggles with.

Human escalation and handoff

The AI will not always have the answer. Build a graceful handoff system. Ask Claude Code: Add an escalation system. The bot should offer to connect the user with a human support agent when it cannot find relevant information in the knowledge base indicated by low similarity scores on all retrieved chunks, when the user explicitly asks for a human by using phrases like talk to a person or speak to someone or human agent, or when the conversation has gone back and forth more than 5 times without resolution. When escalation triggers, show a form collecting the user name and email, then create a support ticket in the database with the full conversation history. Send an email notification to the support team via Resend. Show the user a confirmation with an estimated response time. The handoff must preserve context. The support team member who picks up the ticket should see the user question, everything the bot said, the knowledge base articles that were retrieved along with their relevance scores, and exactly where in the conversation escalation occurred. Ask Claude Code to create an admin page at /admin/tickets that lists all escalated conversations. Each ticket should show user information, the complete conversation, the escalation reason, and status fields for open, in-progress, and resolved. Support agents can update the status and add internal notes. Test the complete escalation flow end to end: ask a question the bot cannot answer, verify it offers human assistance, fill out the contact form, check that the ticket appears in the database, verify the email notification is sent, and confirm the ticket displays correctly in the admin panel.

Testing and quality assurance

A support bot that gives wrong answers is worse than having no bot at all. Build a testing system. Ask Claude Code: Create a test suite for the support bot including unit tests for the retrieval system verifying that billing questions retrieve billing chunks, integration tests for the API route verifying responses reference correct knowledge base content, tests for escalation triggers verifying that low-confidence responses trigger escalation, and tests for conversation context verifying that follow-up questions use previous context. Use Vitest as the testing framework. Beyond automated tests, create a manual QA spreadsheet. Ask Claude Code: Generate a list of 30 test questions across all knowledge base topics. Include 10 straightforward questions with clear expected answers, 10 edge cases covering ambiguous questions, multi-topic questions, and questions with typos, and 10 questions that should trigger escalation including off-topic questions, complex issues, and explicit requests for human help. Output as a CSV with columns for question, expected topic, expected escalation, and notes. Run through the CSV manually, scoring each response on accuracy, relevance, tone, and correct escalation behavior. Fix issues by updating the knowledge base content, adjusting the system prompt, or refining the retrieval similarity thresholds. Repeat until accuracy exceeds 90 percent on your test set. This QA process should be repeated every time you update the knowledge base to catch regressions.

Deployment and production monitoring

Deploy to Vercel with the standard process: push to GitHub, connect to Vercel, add environment variables including ANTHROPIC_API_KEY, DATABASE_URL, and RESEND_API_KEY, then deploy. For the database, use a hosted PostgreSQL from Neon or Supabase. Run the knowledge base indexing script against the production database before launching to users. Add monitoring to track bot performance in production. Ask Claude Code: Add analytics tracking to the chat API. Log every conversation anonymized, response times, retrieval confidence scores, escalation rates, and the most common questions asked. Create an /admin/analytics page showing total conversations per day, week, and month, average response time, escalation rate trend over time, the top 10 most frequently asked questions, and the lowest-confidence responses which indicate knowledge gaps in your documentation. The lowest-confidence responses are the most actionable data point — they tell you exactly what content to add to your knowledge base. If users keep asking about a feature and the bot cannot answer well, you need a knowledge base article about that feature. Set up alerts for escalation rate exceeding 20 percent which signals knowledge base gaps, average response time exceeding 5 seconds which signals a performance issue, and error rate exceeding 1 percent which signals API or database problems. The support bot is now a living system that improves over time as you expand the knowledge base based on real user questions and conversation patterns.

Related Lesson