Activation Function
A mathematical function inside a neural network that decides whether a neuron should fire, introducing the non-linearity that lets networks learn complex patterns.
An activation function is a small but essential piece of mathematics inside every neuron of a neural network. Its job is to decide whether and how strongly a neuron should "fire" β that is, pass its signal forward to the next layer.
Why activation functions exist
Without activation functions, a neural network would be nothing more than a series of linear calculations stacked on top of each other. No matter how many layers you add, the result would always be a simple linear transformation β useless for learning complex patterns like language, images, or human behaviour. Activation functions introduce non-linearity, which is what gives neural networks their power.
Common activation functions
- ReLU (Rectified Linear Unit) is the most widely used. It outputs zero for any negative input and passes positive values through unchanged. It is simple, fast, and effective for most tasks.
- Sigmoid squashes values into a range between zero and one. It was popular in early networks and is still used in output layers for binary classification.
- Softmax converts a vector of numbers into probabilities that sum to one. It is used in the final layer of classification models to produce confidence scores across categories.
- Tanh maps values between negative one and one. It centres data around zero, which can help training converge faster.
How they fit into the bigger picture
Each neuron in a network takes inputs, multiplies them by weights, adds a bias, and then passes the result through an activation function. The function determines the neuron's output, which becomes the input for the next layer. During training, backpropagation adjusts the weights so that the right neurons activate for the right patterns.
Choosing the right activation function affects how quickly a model trains and how well it performs. Most modern architectures default to ReLU or its variants for hidden layers, with softmax or sigmoid for outputs.
Why This Matters
You do not need to implement activation functions yourself, but understanding them helps you follow technical discussions about model architecture and training. When a vendor says their model uses a novel activation scheme, you can assess whether that claim is meaningful or marketing.
Related Terms
Continue learning in Advanced
This topic is covered in our lesson: How LLMs Actually Work