Core AI

BERT (Bidirectional Encoder Representations from Transformers)

Last reviewed: April 2026

A landmark language model from Google that reads text in both directions simultaneously, dramatically improving how machines understand context and meaning in natural language.

BERT — Bidirectional Encoder Representations from Transformers — is a language model developed by Google in 2018 that fundamentally changed how machines understand text. Before BERT, most language models read text in one direction — left to right or right to left. BERT reads in both directions at once, giving it a far richer understanding of context.

Why bidirectional matters

Consider the word "bank" in two sentences: "I walked along the river bank" and "I deposited money at the bank." A left-to-right model reading "I deposited money at the" might not yet have enough context to distinguish between meanings. BERT, by reading the entire sentence simultaneously, understands from the start that "bank" refers to a financial institution in one case and a riverbank in the other.

This bidirectional approach is what made BERT so effective at understanding the meaning behind search queries, questions, and documents — tasks where context is everything.

How BERT is trained

BERT uses two clever training techniques:

Masked Language Modelling (MLM): Random words in a sentence are hidden (masked), and the model must predict them from the surrounding context. This forces BERT to build a deep understanding of how words relate to each other.
Next Sentence Prediction (NSP): The model is given two sentences and must determine whether the second logically follows the first. This teaches BERT to understand relationships between sentences.

BERT's impact on search

Google integrated BERT into its search engine in 2019, calling it the biggest improvement to search in five years. It allowed Google to better understand conversational queries — the kind of natural language questions people actually type. For example, understanding that "can you get medicine for someone pharmacy" is about picking up a prescription for another person.

BERT versus GPT

BERT and GPT represent two different design philosophies. BERT is an encoder model — it excels at understanding and classifying text. GPT is a decoder model — it excels at generating text. BERT reads the whole input at once to understand it; GPT generates text one token at a time. Most modern AI assistants like ChatGPT and Claude are based on the generative (decoder) approach, but BERT's influence on the field was enormous.

Where BERT is used today

BERT and its descendants (RoBERTa, DistilBERT, ALBERT) are widely used in enterprise applications: search engines, document classification, sentiment analysis, question answering, and named entity recognition. They are smaller and cheaper to run than large generative models, making them ideal for specific understanding tasks where you do not need text generation.

Want to go deeper?

This topic is covered in our Essentials level. Access all 100+ lessons free.

Why This Matters

BERT is the model that made AI-powered search actually useful. Understanding its role helps you appreciate why modern search, document classification, and text analysis tools work so much better than their predecessors — and when a BERT-style model might be more cost-effective than a large generative model for your specific use case.

Related Terms

Transformer

The neural network architecture behind modern AI assistants like ChatGPT and Claude. Introduced in 2017, it processes all words simultaneously using an attention mechanism.

Natural Language Processing (NLP)

The branch of AI focused on enabling computers to understand, interpret, and generate human language in useful ways.

Encoder-Only Model

A transformer architecture that processes input text bidirectionally to produce rich representations, used for classification, search, and understanding tasks rather than text generation.

Fine-Tuning

Training an existing AI model on your specific data to improve its performance on your specific tasks. Like giving the AI specialised on-the-job training.

Learn More

Continue learning in Essentials

This topic is covered in our lesson: The AI Landscape — Models, Tools, and Players

← Back to Glossary