Core AI

Activation Function

Last reviewed: April 2026

A mathematical function inside a neural network that decides whether a neuron should fire, introducing the non-linearity that lets networks learn complex patterns.

An activation function is a small but essential piece of mathematics inside every neuron of a neural network. Its job is to decide whether and how strongly a neuron should "fire" — that is, pass its signal forward to the next layer.

Why activation functions exist

Without activation functions, a neural network would be nothing more than a series of linear calculations stacked on top of each other. No matter how many layers you add, the result would always be a simple linear transformation — useless for learning complex patterns like language, images, or human behaviour. Activation functions introduce non-linearity, which is what gives neural networks their power.

Common activation functions

ReLU (Rectified Linear Unit) is the most widely used. It outputs zero for any negative input and passes positive values through unchanged. It is simple, fast, and effective for most tasks.
Sigmoid squashes values into a range between zero and one. It was popular in early networks and is still used in output layers for binary classification.
Softmax converts a vector of numbers into probabilities that sum to one. It is used in the final layer of classification models to produce confidence scores across categories.
Tanh maps values between negative one and one. It centres data around zero, which can help training converge faster.

How they fit into the bigger picture

Each neuron in a network takes inputs, multiplies them by weights, adds a bias, and then passes the result through an activation function. The function determines the neuron's output, which becomes the input for the next layer. During training, backpropagation adjusts the weights so that the right neurons activate for the right patterns.

Choosing the right activation function affects how quickly a model trains and how well it performs. Most modern architectures default to ReLU or its variants for hidden layers, with softmax or sigmoid for outputs.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

You do not need to implement activation functions yourself, but understanding them helps you follow technical discussions about model architecture and training. When a vendor says their model uses a novel activation scheme, you can assess whether that claim is meaningful or marketing.

Related Terms

Neural Network

A computing system loosely inspired by the human brain, made of layers of interconnected nodes that learn to recognise patterns in data.

Deep Learning

A subset of machine learning that uses neural networks with many layers to learn complex patterns. The 'deep' refers to the number of layers, not the depth of understanding.

Machine Learning (ML)

A type of AI where systems learn patterns from data instead of following explicitly programmed rules. The system improves its performance through experience.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: How LLMs Actually Work

← Back to Glossary