BERT (Bidirectional Encoder Representations from Transformers)
A landmark language model from Google that reads text in both directions simultaneously, dramatically improving how machines understand context and meaning in natural language.
BERT β Bidirectional Encoder Representations from Transformers β is a language model developed by Google in 2018 that fundamentally changed how machines understand text. Before BERT, most language models read text in one direction β left to right or right to left. BERT reads in both directions at once, giving it a far richer understanding of context.
Why bidirectional matters
Consider the word "bank" in two sentences: "I walked along the river bank" and "I deposited money at the bank." A left-to-right model reading "I deposited money at the" might not yet have enough context to distinguish between meanings. BERT, by reading the entire sentence simultaneously, understands from the start that "bank" refers to a financial institution in one case and a riverbank in the other.
This bidirectional approach is what made BERT so effective at understanding the meaning behind search queries, questions, and documents β tasks where context is everything.
How BERT is trained
BERT uses two clever training techniques:
- Masked Language Modelling (MLM): Random words in a sentence are hidden (masked), and the model must predict them from the surrounding context. This forces BERT to build a deep understanding of how words relate to each other.
- Next Sentence Prediction (NSP): The model is given two sentences and must determine whether the second logically follows the first. This teaches BERT to understand relationships between sentences.
BERT's impact on search
Google integrated BERT into its search engine in 2019, calling it the biggest improvement to search in five years. It allowed Google to better understand conversational queries β the kind of natural language questions people actually type. For example, understanding that "can you get medicine for someone pharmacy" is about picking up a prescription for another person.
BERT versus GPT
BERT and GPT represent two different design philosophies. BERT is an encoder model β it excels at understanding and classifying text. GPT is a decoder model β it excels at generating text. BERT reads the whole input at once to understand it; GPT generates text one token at a time. Most modern AI assistants like ChatGPT and Claude are based on the generative (decoder) approach, but BERT's influence on the field was enormous.
Where BERT is used today
BERT and its descendants (RoBERTa, DistilBERT, ALBERT) are widely used in enterprise applications: search engines, document classification, sentiment analysis, question answering, and named entity recognition. They are smaller and cheaper to run than large generative models, making them ideal for specific understanding tasks where you do not need text generation.
Why This Matters
BERT is the model that made AI-powered search actually useful. Understanding its role helps you appreciate why modern search, document classification, and text analysis tools work so much better than their predecessors β and when a BERT-style model might be more cost-effective than a large generative model for your specific use case.
Related Terms
Continue learning in Essentials
This topic is covered in our lesson: The AI Landscape β Models, Tools, and Players
Training your team on AI? Enigmatica offers structured enterprise training built on this curriculum. Explore enterprise AI training β