Core AI

Foundation Model

Last reviewed: April 2026

A large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks — GPT, Claude, and Llama are all foundation models.

A foundation model is a large AI model trained on broad, diverse data at massive scale that serves as the base for many different applications. The term was coined by Stanford researchers in 2021 to describe models like GPT, Claude, Llama, and Gemini that are not built for one task but can be adapted to thousands.

What makes a model "foundational"

Scale — trained on enormous datasets (trillions of tokens of text, billions of images)
Breadth — not specialised for any single task but capable across many
Adaptability — can be customised for specific applications through prompting, fine-tuning, or other techniques
Emergent capabilities — displays abilities that were not explicitly trained for, such as reasoning, translation, or code generation

The paradigm shift

Before foundation models, the AI workflow was: collect task-specific data → train a task-specific model → deploy for that one task. This meant building a separate model for each application.

Foundation models invert this: train one massive general model → adapt it to many tasks with minimal effort. This is why a single model like Claude can write marketing copy, debug code, analyse data, and answer questions — all without separate training for each.

Key foundation models

GPT-5.4 (OpenAI) — text and multimodal generation
Claude (Anthropic) — text generation with emphasis on safety and helpfulness
Gemini (Google) — multimodal understanding and generation
Llama (Meta) — open-weight text generation
Stable Diffusion — image generation
Whisper (OpenAI) — speech recognition

Risks and considerations

Centralisation — a few organisations control the most capable models, creating dependency
Homogenisation — when everyone uses the same foundation models, their biases and limitations propagate everywhere
Cost — training foundation models costs millions of dollars, limiting who can create them
Opacity — the training data and processes of commercial foundation models are often not fully disclosed

Want to go deeper?

This topic is covered in our Essentials level. Access all 100+ lessons free.

Why This Matters

Foundation models are the platform layer of modern AI — nearly every AI application you use is built on one. Understanding this helps you evaluate the rapidly evolving landscape of AI products and make strategic decisions about which models and providers to build on for your organisation.

Related Terms

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Generative AI

AI that creates new content — text, images, code, audio, video — rather than just analysing or classifying existing data.

Transformer

The neural network architecture behind modern AI assistants like ChatGPT and Claude. Introduced in 2017, it processes all words simultaneously using an attention mechanism.

Fine-Tuning

Training an existing AI model on your specific data to improve its performance on your specific tasks. Like giving the AI specialised on-the-job training.

Pre-training

The initial, large-scale training phase where a foundation model learns general knowledge from vast amounts of data before being specialised for specific tasks.

Learn More

Continue learning in Essentials

This topic is covered in our lesson: Choosing the Right AI Model

← Back to Glossary