Practical

Semantic Chunking

Last reviewed: April 2026

A technique for splitting documents into meaningful segments based on topic or meaning rather than arbitrary character counts, improving the quality of AI retrieval and analysis.

Semantic chunking is a document processing technique that splits text into segments based on meaning and topic rather than arbitrary character or token counts. It is a critical component of retrieval-augmented generation (RAG) systems, where the quality of chunks directly affects the quality of AI responses.

The problem with naive chunking

The simplest approach to chunking is splitting text at fixed intervals — every 500 tokens, for example. This is fast and easy to implement but creates several problems:

Broken context: A paragraph explaining a concept might be split in the middle, with the setup in one chunk and the conclusion in another.
Mixed topics: A single chunk might contain the end of one section and the beginning of another, making it match queries about both topics poorly.
Lost structure: Document structure (headers, sections, lists) is ignored, losing valuable organisational information.

How semantic chunking works

Semantic chunking uses meaning to determine where to split:

Embedding-based splitting: Compute embeddings for successive sentences or paragraphs. When the cosine similarity between consecutive segments drops below a threshold, insert a split — the topic has changed.
Structure-aware splitting: Use document structure (headings, paragraphs, section breaks) as natural split points, keeping related content together.
LLM-based splitting: Use a language model to identify topic boundaries and determine optimal split points.
Hybrid approaches: Combine structural and semantic signals — split at paragraph boundaries, but merge short paragraphs about the same topic and split long paragraphs that cover multiple topics.

Why chunk quality matters for RAG

In a RAG system, the retrieval step finds chunks that are relevant to the user's query. If chunks are poorly constructed:

A relevant answer might be split across multiple chunks, with neither chunk individually matching the query well enough to be retrieved.
Irrelevant information mixed into a chunk adds noise that confuses the model.
Missing context makes it impossible for the model to formulate a complete answer.

Chunk size considerations

Too small: Individual sentences lose context. "It increased by 40%" means nothing without knowing what "it" refers to.
Too large: Chunks contain too much irrelevant information, diluting the relevant content and consuming context window space.
Sweet spot: Typically 200-500 tokens, though this varies by document type and use case.

Advanced techniques

Overlapping chunks: Create chunks that overlap by a percentage, ensuring that information near chunk boundaries is captured in multiple chunks.
Hierarchical chunking: Create chunks at multiple granularities (paragraph, section, document) and retrieve at the most appropriate level.
Parent-child chunking: Index small chunks for precise retrieval but return the parent chunk (which includes surrounding context) to the model.

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Semantic chunking is often the difference between a RAG system that gives accurate, contextual answers and one that produces vague or incomplete responses. Understanding chunking strategies helps you diagnose and fix quality issues in AI-powered knowledge systems.

Related Terms

Retrieval-Augmented Generation (RAG)

A technique that connects AI to your own documents and data so it can answer questions using your specific information, not just its general training.

Embedding

A numerical representation of text (or images, audio, etc.) that captures its meaning. Embeddings let AI measure how similar two pieces of content are.

Semantic Search

Search that finds results based on meaning and intent rather than exact keyword matches. Powered by vector embeddings that represent concepts as numbers.

Vector Database

A specialised database designed to store and search embeddings — the numerical representations of text, images, or other data used in AI applications.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Building Your First AI Workflow

← Back to Glossary