Normalisation
The process of scaling numerical data to a standard range or distribution, ensuring that no single feature dominates model training simply because of its scale.
Normalisation (also spelled normalization) is the process of rescaling numerical data so that features with different units or ranges contribute equally to model training. Without normalisation, features with large values can dominate the learning process.
Why normalisation matters
Imagine a model using two features: annual salary (ranging from 20,000 to 200,000) and years of experience (ranging from 0 to 40). Without normalisation, salary would have a much larger influence on the model simply because its numbers are bigger β not because it is more important. Normalisation puts all features on an equal footing.
Common normalisation techniques
- Min-max scaling β rescales values to a fixed range, typically 0 to 1. Formula: (value - min) / (max - min). Good when you know the data bounds.
- Z-score standardisation β centres data at zero with a standard deviation of one. Formula: (value - mean) / standard deviation. Good for data that follows a roughly normal distribution.
- Robust scaling β uses median and interquartile range instead of mean and standard deviation. Less affected by outliers.
- Log transformation β applies a logarithm to compress the range of highly skewed data (common for income, prices, or page views).
When normalisation is essential
- Distance-based algorithms β k-nearest neighbours, clustering, and support vector machines are highly sensitive to feature scales
- Gradient descent β training converges much faster with normalised features because the loss landscape is more uniform
- Neural networks β batch normalisation and layer normalisation are standard components that normalise activations between layers
- Regularisation β techniques like L1 and L2 regularisation penalise large weights, so features must be on similar scales for fair penalisation
When normalisation is unnecessary
- Tree-based models β decision trees, random forests, and gradient boosted trees split on thresholds and are unaffected by feature scale
- Pre-normalised data β data that is already on a consistent scale (percentages, binary flags)
Normalisation in deep learning
Modern neural networks use internal normalisation layers:
- Batch normalisation β normalises activations across a mini-batch
- Layer normalisation β normalises across features within each example (used in transformers)
These techniques stabilise and accelerate training, and are now standard in most architectures.
Why This Matters
Normalisation is a fundamental data preparation step that can dramatically improve model performance with minimal effort. If your data science team skips it, distance-based models and neural networks will perform poorly regardless of how good the underlying data is. It is one of the simplest yet most impactful preprocessing steps.
Related Terms
Continue learning in Practitioner
This topic is covered in our lesson: Building Your First AI Workflow