Vector Index
A specialised data structure that enables fast similarity searches across millions or billions of vectors, powering the retrieval behind semantic search and RAG systems.
A vector index is a data structure designed to efficiently search for the most similar vectors among millions or billions of candidates. It is the technology that makes semantic search, recommendation systems, and retrieval-augmented generation (RAG) fast enough for real-time applications.
The brute-force problem
Finding the most similar vector in a collection requires comparing the query vector against every stored vector. With a million vectors of 1,536 dimensions each, that is over 1.5 billion floating-point comparisons per query. At ten queries per second, this becomes computationally prohibitive.
Vector indices solve this by organising vectors in ways that allow approximate similarity searches without comparing against every candidate.
Major indexing approaches
- IVF (Inverted File Index): Divides the vector space into regions (clusters) using k-means clustering. At query time, only vectors in the nearest clusters are searched. Think of it like looking up a word in a dictionary β you go to the right section first, then search within it.
- HNSW (Hierarchical Navigable Small World): Builds a multi-layered graph where each vector is connected to its nearest neighbours. Search starts at the top layer (coarse) and progressively refines through lower layers (detailed). Currently the most popular approach for its excellent speed-accuracy balance.
- LSH (Locality-Sensitive Hashing): Uses hash functions that map similar vectors to the same bucket. Searching only within matching buckets dramatically reduces the search space.
- Tree-based indices: Structures like Annoy (Approximate Nearest Neighbours Oh Yeah) build binary trees that recursively partition the space. Simple to implement but less effective at very high dimensions.
- Product Quantization (PQ): Compresses vectors by splitting them into sub-vectors and quantising each independently. Reduces memory usage dramatically while maintaining reasonable search quality.
Exact versus approximate search
Most vector indices provide approximate nearest neighbour (ANN) search β they find vectors that are very probably the closest, but not guaranteed to be. The trade-off is dramatic: exact search on a million vectors might take 100 milliseconds, while ANN search with 99% recall takes 1 millisecond.
For most applications, approximate search is more than sufficient. The minor loss in recall is vastly outweighed by the performance improvement.
Choosing an index
The right index depends on your requirements:
- Speed priority: HNSW offers the fastest query times but uses more memory.
- Memory priority: IVF with product quantisation achieves good performance with minimal memory.
- Simplicity priority: Flat (brute-force) search works fine for collections under 100,000 vectors.
- Scale priority: Distributed indices (like those in Pinecone, Weaviate, or Qdrant) handle billions of vectors across multiple machines.
The practical impact
Vector indices are what make AI-powered search feel instant. When you search a knowledge base using natural language and get relevant results in milliseconds, a vector index is doing the heavy lifting behind the scenes. Without efficient indexing, semantic search would be too slow for interactive use.
Why This Matters
Vector indices power the retrieval layer in AI applications from semantic search to RAG. Understanding their trade-offs helps you design AI systems that are both fast and accurate, and evaluate vector database vendors based on the indexing approaches they support.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: Building Your Own AI Solutions
Training your team on AI? Enigmatica offers structured enterprise training built on this curriculum. Explore enterprise AI training β