Hyperparameter Tuning
The process of finding the optimal configuration settings for an AI model β the knobs you set before training begins that determine how the model learns.
Hyperparameter tuning is the process of finding the best configuration settings for a machine learning model. Unlike model parameters (which are learned during training), hyperparameters are set before training begins and control how the learning process itself works.
Parameters versus hyperparameters
- Parameters: The values the model learns during training β for example, the weights in a neural network. You do not set these; the training process finds them.
- Hyperparameters: The settings you choose before training starts β for example, the learning rate, the number of layers, the batch size, the dropout rate. These control how training happens.
Think of it as the difference between what a student learns (parameters) and the study method they use (hyperparameters). You can choose the study method; the knowledge is acquired through the process.
Common hyperparameters
- Learning rate: How much the model adjusts its weights in response to each error. Too high and the model overshoots; too low and training is painfully slow.
- Batch size: How many training examples the model processes before updating its weights. Larger batches are more stable but use more memory.
- Number of epochs: How many times the model sees the entire training dataset.
- Network architecture: The number of layers, neurons per layer, and activation functions.
- Regularisation strength: How aggressively overfitting is penalised (dropout rate, weight decay coefficient).
Tuning approaches
- Grid search: Try every combination of hyperparameter values from a predefined grid. Thorough but computationally expensive.
- Random search: Try random combinations. Surprisingly effective β research shows random search often finds good configurations faster than grid search.
- Bayesian optimisation: Use a probabilistic model to predict which combinations are likely to perform well, focusing the search on promising regions.
- Population-based training: Run multiple training runs simultaneously, periodically copying hyperparameters from the best-performing runs. Used by DeepMind for large-scale experiments.
Why tuning matters
The same model architecture with different hyperparameters can produce wildly different results. A neural network with a learning rate of 0.001 might achieve 95% accuracy, while the same network with a learning rate of 0.1 might fail to learn at all. Proper tuning is often the difference between a model that works and one that does not.
Automated approaches
Modern tools like Optuna, Ray Tune, and Weights & Biases Sweeps automate much of the hyperparameter tuning process. These tools manage the search, track results, and can early-stop unpromising configurations to save compute.
Practical advice
Start with established defaults from the literature or framework documentation. Tune the most impactful hyperparameters first (learning rate is almost always the most important). Use validation performance, not training performance, to evaluate configurations. And document everything β reproducibility depends on recording exact hyperparameter values.
Why This Matters
Hyperparameter tuning is where much of the "art" in AI model development lies. Understanding this process helps you appreciate why two teams using the same algorithm can get very different results, and why AI development involves systematic experimentation rather than simply pressing a "train" button.
Related Terms
Continue learning in Practitioner
This topic is covered in our lesson: Building Your First AI Workflow
Training your team on AI? Enigmatica offers structured enterprise training built on this curriculum. Explore enterprise AI training β