Skip to main content
Early access β€” new tools and guides added regularly
Core AI

Random Forest

Last reviewed: April 2026

A machine learning algorithm that builds many decision trees and combines their predictions to produce more accurate and reliable results.

A random forest is a machine learning algorithm that works by creating a large number of decision trees β€” each trained on a slightly different random subset of the data β€” and then combining their predictions. The "forest" is the collection of trees, and the final answer is determined by majority vote (for classification) or averaging (for regression).

How a single decision tree works

A decision tree is like a flowchart. It asks a series of yes/no questions about your data to arrive at a prediction. For example, a tree predicting customer churn might ask: Is the customer's contract month-to-month? Have they called support more than three times? Is their monthly bill above Β£80? Each answer leads to the next question until the tree reaches a prediction.

The problem with a single decision tree is that it tends to memorise the training data too closely β€” a problem called overfitting. It performs brilliantly on data it has seen but poorly on new data.

How the forest fixes this

A random forest solves overfitting by building hundreds or thousands of trees, each slightly different:

  • Each tree is trained on a random sample of the training data (with replacement)
  • At each decision point, each tree considers only a random subset of available features
  • The final prediction is the consensus of all trees

This randomness makes the forest robust. Individual trees might make mistakes, but their errors tend to cancel out when aggregated.

When random forests are used

Random forests excel at structured data problems β€” spreadsheets, databases, and tabular data. Common applications include:

  • Credit scoring and fraud detection
  • Customer churn prediction
  • Medical diagnosis support
  • Feature importance analysis (identifying which variables matter most)

Strengths and limitations

Random forests are fast to train, hard to break, and require minimal tuning. They handle missing data well and naturally rank feature importance. However, they struggle with unstructured data like images and text, where deep learning methods are superior. They also produce models that are harder to interpret than a single decision tree.

Want to go deeper?
This topic is covered in our Practitioner level. Access all 60+ lessons free.

Why This Matters

Random forests remain one of the most reliable algorithms for business analytics and structured data problems. They often outperform more complex approaches on tabular data and are a strong default choice when you need a quick, reliable predictive model. Understanding when to use a random forest versus a neural network helps you avoid over-engineering solutions.

Related Terms

Learn More

Continue learning in Practitioner

This topic is covered in our lesson: Understanding AI Models and When to Use Them