Feature Store
A centralised repository for storing, managing, and serving the pre-computed data features used to train and run AI models, ensuring consistency between training and production.
A feature store is a centralised system for managing the features β the input variables β that machine learning models use for both training and prediction. It acts as a shared repository where data teams can store, discover, and reuse pre-computed features across different models and projects.
The problem feature stores solve
In machine learning, features are the processed data inputs that models learn from. For example, a customer churn model might use features like "average monthly spend," "days since last purchase," and "number of support tickets." Creating these features requires data engineering work β querying databases, aggregating data, handling missing values, and computing derived metrics.
Without a feature store, this work is duplicated across teams and projects. The marketing team computes "average monthly spend" one way; the finance team computes it differently. The features used to train a model differ subtly from those used in production, causing "training-serving skew" β a major source of production failures.
How a feature store works
A feature store typically provides:
- Feature registry: A catalogue of all available features with metadata β who created them, what they mean, how they are computed.
- Offline store: Stores historical feature values for training models. Answers the question: "What was this customer's average monthly spend six months ago?"
- Online store: Serves feature values in real time for production predictions. Answers the question: "What is this customer's average monthly spend right now?"
- Feature pipelines: Automated processes that compute and update features on a schedule.
Why consistency matters
The most critical function of a feature store is ensuring that the exact same feature definitions are used during training and production. If a model learns that "average monthly spend" means the mean of the last 12 months during training, but receives the mean of the last 6 months in production, its predictions will be unreliable.
Popular feature stores
- Feast: An open-source feature store that integrates with common data infrastructure.
- Tecton: A managed feature store built by the creators of Uber's Michelangelo ML platform.
- Databricks Feature Store: Integrated with the Databricks lakehouse platform.
- AWS SageMaker Feature Store: Amazon's managed offering for AWS users.
When you need a feature store
Feature stores are most valuable when:
- Multiple teams build models that share common features
- Models require real-time predictions with consistent feature computation
- Feature engineering represents a significant portion of development time
- Training-serving skew has caused production issues
Why This Matters
Feature stores address one of the most common failure points in enterprise AI deployment: the gap between how data is prepared during development and how it is served in production. Understanding this concept helps you evaluate the maturity of your organisation's AI infrastructure.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: AI Infrastructure and Deployment
Training your team on AI? Enigmatica offers structured enterprise training built on this curriculum. Explore enterprise AI training β