Practical

Retrieval-Augmented Generation (RAG)

Last reviewed: April 2026

A technique that connects AI to your own documents and data so it can answer questions using your specific information, not just its general training.

Retrieval-augmented generation, or RAG, is a technique that connects an AI model to an external knowledge source — your documents, databases, wikis, or files — so that it can answer questions using your specific information rather than relying solely on its general training data.

The problem RAG solves

LLMs like ChatGPT and Claude are trained on public data. They know a lot about the world in general, but they know nothing about your company's internal policies, your product documentation, your customer data, or your proprietary processes. When you ask about these topics, the AI either admits it does not know or — worse — halluccinates a plausible-sounding but incorrect answer.

RAG solves this by giving the AI access to your data at the moment it generates a response.

How RAG works

The RAG process has three steps:

Indexing (one-time setup): Your documents are broken into chunks and converted into embeddings — numerical representations that capture the meaning of each chunk. These embeddings are stored in a vector database.

Retrieval (every query): When a user asks a question, the system converts the question into an embedding and searches the vector database for the most relevant document chunks. This is semantic search — it finds content by meaning, not just keyword matching.

Generation (every query): The retrieved chunks are included in the AI's context alongside the user's question. The AI generates a response grounded in your specific documents, often citing the source material.

RAG vs fine-tuning

Both RAG and fine-tuning help AI work with your data, but they solve different problems:

RAG is best for factual, up-to-date information retrieval. It is relatively easy to implement, and you can update the knowledge source without retraining the model.
Fine-tuning is best for changing the model's behaviour, style, or capabilities. It requires more technical expertise and computing resources.

Think of it this way: RAG gives the AI a reference library to consult. Fine-tuning changes how the AI thinks.

Most business use cases are better served by RAG because: - Your data changes frequently (policies, products, pricing) - You want the AI to cite sources - You need to control exactly what information the AI has access to - Implementation is faster and less expensive than fine-tuning

Practical RAG applications

Internal knowledge base: Employees ask questions and get answers sourced from company documentation
Customer support: AI agents answer customer queries using your product documentation and FAQ
Legal research: AI searches through contracts, case law, or compliance documents
Sales enablement: AI answers questions about pricing, features, and competitors using your sales materials
HR: Employees get instant answers about policies, benefits, and procedures

Quality considerations

RAG quality depends heavily on: - Chunking strategy: How you split documents affects whether the AI retrieves the right context - Embedding quality: Better embeddings produce more accurate retrieval - Relevance ranking: Ensuring the most useful chunks rise to the top - Source diversity: Including enough relevant documents to cover the topic

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

RAG is how most organisations will connect AI to their proprietary knowledge. It transforms AI from a general-purpose assistant into a company-specific expert that knows your products, policies, and processes. For any business sitting on valuable internal documentation that employees struggle to search through — which is nearly every organisation — RAG represents an immediate, high-value application of AI technology.

Related Terms

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Embedding

A numerical representation of text (or images, audio, etc.) that captures its meaning. Embeddings let AI measure how similar two pieces of content are.

Vector Database

A specialised database designed to store and search embeddings — the numerical representations of text, images, or other data used in AI applications.

Fine-Tuning

Training an existing AI model on your specific data to improve its performance on your specific tasks. Like giving the AI specialised on-the-job training.

Hallucination

When AI generates confident but incorrect information. The AI is not lying — it is producing statistically plausible text that happens to be wrong.

Context Window

The maximum amount of text an AI can process at once. Think of it as the AI's working memory — everything it can see and consider when generating a response.

Related Comparisons

RAG vs Fine-Tuning

RAG (Retrieval-Augmented Generation) vs fine-tuning compared across setup complexity, cost, data freshness, accuracy, customisation depth, and maintenance.

Notion AI vs Microsoft Copilot

Notion AI vs Microsoft Copilot compared across writing quality, data integration, workspace coverage, knowledge base capabilities, pricing, and team collaboration.

Perplexity vs Google AI Overview

Perplexity vs Google AI Overview compared across source transparency, accuracy, research depth, follow-up capability, real-time data, and academic research.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: Context Windows: Why AI Forgets and How to Fix It

← Back to Glossary