Practical

Shadow Deployment

Last reviewed: April 2026

A testing strategy where a new AI model runs alongside the existing one in production, processing the same requests but without its responses reaching users.

Shadow deployment (also called shadow testing or dark launching) is a strategy for testing a new AI model in production conditions without exposing its outputs to users. The new model processes real requests in parallel with the existing model, but only the existing model's responses are shown to users.

Why shadow deployment matters

Testing AI models in staging environments has limits. Staging data is often simulated or sampled, traffic patterns are artificial, and edge cases are underrepresented. Shadow deployment lets you test against real production traffic with zero risk to users.

How it works

The existing (champion) model continues serving users as normal.
Every incoming request is duplicated and sent to the new (challenger) model.
Both models process the request independently.
Only the champion model's response goes to the user.
Both responses are logged for comparison and analysis.

What you learn from shadow deployment

Quality comparison: How does the new model's output compare to the existing model's? Better, worse, or different?
Latency impact: How fast is the new model under real production load?
Error rates: Does the new model fail on inputs the existing model handles?
Cost implications: What will the new model cost at production scale?
Edge cases: How does the new model handle unusual inputs that staging testing might miss?

Shadow deployment patterns

Full shadow: Every request goes to both models. Comprehensive but doubles compute costs.
Sampled shadow: A percentage of requests (e.g., 10%) go to the new model. Cheaper but less comprehensive.
Conditional shadow: Only specific types of requests go to the new model (e.g., only English-language queries, or only queries from a specific feature).

Comparing shadow results

Automated evaluation: Use metrics like semantic similarity, format compliance, and response length to compare outputs programmatically.
Human evaluation: Sample paired outputs for human reviewers to judge quality.
Business metric correlation: Map model outputs to business outcomes (conversion rates, satisfaction scores) to identify the better model.

From shadow to live

Once shadow testing confirms the new model meets quality, latency, and cost requirements, you can gradually shift traffic from the champion to the challenger — typically starting with a small percentage and increasing over days or weeks.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Shadow deployment is the safest way to validate AI model changes in production. It eliminates the risk of degrading user experience while providing the most realistic evaluation possible. Any organisation running AI in production should consider shadow deployment as part of their model update process.

Related Terms

Model Serving

The infrastructure and process of making a trained AI model available to receive requests and return predictions in real time.

Model Drift

The gradual decline in an AI model's performance over time as the real-world data it encounters changes from the data it was trained on.

Quality Gates

Automated checkpoints between AI generation and human review that catch specific types of errors — format, factual, tone, completeness, and consistency.

Model Registry

An internal system that tracks all AI models an organisation uses, including versions, performance metrics, and deployment status.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: Maintaining AI Systems in Production

← Back to Glossary