Core AI

Vector Index

Last reviewed: April 2026

A specialised data structure that enables fast similarity searches across millions or billions of vectors, powering the retrieval behind semantic search and RAG systems.

A vector index is a data structure designed to efficiently search for the most similar vectors among millions or billions of candidates. It is the technology that makes semantic search, recommendation systems, and retrieval-augmented generation (RAG) fast enough for real-time applications.

The brute-force problem

Finding the most similar vector in a collection requires comparing the query vector against every stored vector. With a million vectors of 1,536 dimensions each, that is over 1.5 billion floating-point comparisons per query. At ten queries per second, this becomes computationally prohibitive.

Vector indices solve this by organising vectors in ways that allow approximate similarity searches without comparing against every candidate.

Major indexing approaches

IVF (Inverted File Index): Divides the vector space into regions (clusters) using k-means clustering. At query time, only vectors in the nearest clusters are searched. Think of it like looking up a word in a dictionary — you go to the right section first, then search within it.
HNSW (Hierarchical Navigable Small World): Builds a multi-layered graph where each vector is connected to its nearest neighbours. Search starts at the top layer (coarse) and progressively refines through lower layers (detailed). Currently the most popular approach for its excellent speed-accuracy balance.
LSH (Locality-Sensitive Hashing): Uses hash functions that map similar vectors to the same bucket. Searching only within matching buckets dramatically reduces the search space.
Tree-based indices: Structures like Annoy (Approximate Nearest Neighbours Oh Yeah) build binary trees that recursively partition the space. Simple to implement but less effective at very high dimensions.
Product Quantization (PQ): Compresses vectors by splitting them into sub-vectors and quantising each independently. Reduces memory usage dramatically while maintaining reasonable search quality.

Exact versus approximate search

Most vector indices provide approximate nearest neighbour (ANN) search — they find vectors that are very probably the closest, but not guaranteed to be. The trade-off is dramatic: exact search on a million vectors might take 100 milliseconds, while ANN search with 99% recall takes 1 millisecond.

For most applications, approximate search is more than sufficient. The minor loss in recall is vastly outweighed by the performance improvement.

Choosing an index

The right index depends on your requirements:

Speed priority: HNSW offers the fastest query times but uses more memory.
Memory priority: IVF with product quantisation achieves good performance with minimal memory.
Simplicity priority: Flat (brute-force) search works fine for collections under 100,000 vectors.
Scale priority: Distributed indices (like those in Pinecone, Weaviate, or Qdrant) handle billions of vectors across multiple machines.

The practical impact

Vector indices are what make AI-powered search feel instant. When you search a knowledge base using natural language and get relevant results in milliseconds, a vector index is doing the heavy lifting behind the scenes. Without efficient indexing, semantic search would be too slow for interactive use.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Vector indices power the retrieval layer in AI applications from semantic search to RAG. Understanding their trade-offs helps you design AI systems that are both fast and accurate, and evaluate vector database vendors based on the indexing approaches they support.

Related Terms

Vector Database

A specialised database designed to store and search embeddings — the numerical representations of text, images, or other data used in AI applications.

Embedding

A numerical representation of text (or images, audio, etc.) that captures its meaning. Embeddings let AI measure how similar two pieces of content are.

Semantic Search

Search that finds results based on meaning and intent rather than exact keyword matches. Powered by vector embeddings that represent concepts as numbers.

Retrieval-Augmented Generation (RAG)

A technique that connects AI to your own documents and data so it can answer questions using your specific information, not just its general training.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: Building Your Own AI Solutions

← Back to Glossary