Core AI

Random Forest

Last reviewed: April 2026

A machine learning algorithm that builds many decision trees and combines their predictions to produce more accurate and reliable results.

A random forest is a machine learning algorithm that works by creating a large number of decision trees — each trained on a slightly different random subset of the data — and then combining their predictions. The "forest" is the collection of trees, and the final answer is determined by majority vote (for classification) or averaging (for regression).

How a single decision tree works

A decision tree is like a flowchart. It asks a series of yes/no questions about your data to arrive at a prediction. For example, a tree predicting customer churn might ask: Is the customer's contract month-to-month? Have they called support more than three times? Is their monthly bill above £80? Each answer leads to the next question until the tree reaches a prediction.

The problem with a single decision tree is that it tends to memorise the training data too closely — a problem called overfitting. It performs brilliantly on data it has seen but poorly on new data.

How the forest fixes this

A random forest solves overfitting by building hundreds or thousands of trees, each slightly different:

Each tree is trained on a random sample of the training data (with replacement)
At each decision point, each tree considers only a random subset of available features
The final prediction is the consensus of all trees

This randomness makes the forest robust. Individual trees might make mistakes, but their errors tend to cancel out when aggregated.

When random forests are used

Random forests excel at structured data problems — spreadsheets, databases, and tabular data. Common applications include:

Credit scoring and fraud detection
Customer churn prediction
Medical diagnosis support
Feature importance analysis (identifying which variables matter most)

Strengths and limitations

Random forests are fast to train, hard to break, and require minimal tuning. They handle missing data well and naturally rank feature importance. However, they struggle with unstructured data like images and text, where deep learning methods are superior. They also produce models that are harder to interpret than a single decision tree.

Want to go deeper?

This topic is covered in our Practitioner level. Access all 100+ lessons free.

Why This Matters

Random forests remain one of the most reliable algorithms for business analytics and structured data problems. They often outperform more complex approaches on tabular data and are a strong default choice when you need a quick, reliable predictive model. Understanding when to use a random forest versus a neural network helps you avoid over-engineering solutions.

Related Terms

Machine Learning (ML)

A type of AI where systems learn patterns from data instead of following explicitly programmed rules. The system improves its performance through experience.

Classification

An AI task that assigns input to predefined categories. Spam detection, sentiment analysis, and image recognition are all classification tasks.

Regression

An AI task that predicts a numerical value based on input data. Sales forecasting, price estimation, and demand prediction are all regression tasks.

Supervised Learning

A machine learning approach where the model learns from labelled examples — input data paired with correct answers. The most common type of machine learning in business applications.

Tabular Data

Data organised in rows and columns like a spreadsheet or database table, representing the most common format for business data and analytics.

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Understanding AI Models and When to Use Them

← Back to Glossary