Practical

Semantic Router

Last reviewed: April 2026

A system that classifies user queries by meaning and routes them to the appropriate handler — model, prompt, tool, or agent — without requiring keyword matching or rigid rules.

A semantic router is a system that analyses the meaning of incoming user queries and routes them to the most appropriate handler — a specific model, prompt template, tool, agent, or response pathway. Unlike traditional keyword-based routing, semantic routing understands intent, handling paraphrases and novel phrasings gracefully.

How semantic routing works

Define routes: Each route corresponds to a handler (a specific prompt, model, tool, or agent) and is associated with a set of example queries that represent the intent it handles.
Embed examples: Convert all example queries into vector embeddings.
Classify incoming queries: When a new query arrives, embed it and compare it against the route examples using cosine similarity.
Route to the best match: The query is sent to the handler associated with the most similar route.

Why semantic routing is better than keyword matching

Keyword routing breaks with synonyms, rephrasing, and novel expressions:

"Cancel my subscription" and "I want to stop paying" mean the same thing but share no keywords.
"How do I return something?" could be a returns question or a programming question depending on context.

Semantic routing handles these cases because it operates on meaning, not surface-level text.

Use cases

Customer service: Route billing questions to a billing-specialised prompt, technical questions to a technical prompt, and general enquiries to a general prompt.
Multi-model systems: Send simple questions to a fast, cheap model and complex ones to a capable, expensive model.
Tool selection: Determine whether a query requires a calculator, a database lookup, a web search, or just the model's knowledge.
Guardrail routing: Detect potentially harmful or off-topic queries and route them to appropriate handling (refusal, escalation, redirection).
Multilingual routing: Detect the language and route to a model or prompt optimised for that language.

Implementation approaches

Embedding-based (fastest): Pre-compute embeddings for route examples. At query time, compute one embedding and do a similarity search. Sub-millisecond latency.
LLM-based classification: Ask an LLM to classify the query into predefined categories. More flexible but slower and more expensive.
Hybrid: Use fast embedding-based routing for clear-cut cases and LLM classification for ambiguous ones.

Building effective routes

Write 5-20 diverse example queries per route covering different phrasings and styles.
Include edge cases and ambiguous examples with explicit route assignments.
Test with real user queries and iteratively improve routes based on misclassifications.
Monitor routing decisions in production and add new examples when novel phrasings emerge.

Performance characteristics

Embedding-based semantic routers are extremely fast — typically under 5 milliseconds per classification. This makes them suitable for real-time applications where routing overhead must be minimal. They also require minimal compute resources compared to LLM-based alternatives.

Semantic routing in agent architectures

In multi-agent systems, semantic routing is often the first step: the router determines the user's intent and dispatches the query to the appropriate specialist agent. This is more natural and flexible than requiring users to explicitly choose which agent to interact with.

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Semantic routing is a foundational technique for building AI applications that scale beyond simple chatbots. Understanding it helps you design systems that handle diverse user needs efficiently and cost-effectively, routing each query to the right tool for the job.

Related Terms

Embedding

A numerical representation of text (or images, audio, etc.) that captures its meaning. Embeddings let AI measure how similar two pieces of content are.

Cosine Similarity

A mathematical measure of how similar two vectors are by calculating the cosine of the angle between them, widely used in AI to compare documents, images, and search queries.

AI Orchestration Layer

The middleware that manages how AI models are selected, invoked, and coordinated within an application — handling routing, fallbacks, retries, and model switching.

Semantic Search

Search that finds results based on meaning and intent rather than exact keyword matches. Powered by vector embeddings that represent concepts as numbers.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Building Your First AI Workflow

← Back to Glossary