Core AI

GPT (Generative Pre-trained Transformer)

Last reviewed: April 2026

A family of large language models created by OpenAI that generate human-like text by predicting the next word, forming the foundation of ChatGPT.

GPT stands for Generative Pre-trained Transformer — a family of large language models created by OpenAI. GPT models generate text by predicting the next token in a sequence, trained on vast amounts of text data from the internet. The GPT architecture is the foundation of ChatGPT and one of the most influential developments in AI history.

Breaking down the name

Generative: The model generates new text rather than just classifying or extracting information
Pre-trained: The model is first trained on a massive general dataset (the "pre-training" phase) before being adapted for specific tasks
Transformer: The underlying neural network architecture that enables the model to process text efficiently using self-attention mechanisms

The evolution of GPT

GPT-1 (2018): Demonstrated that pre-training on large text corpora followed by fine-tuning could produce strong language understanding. 117 million parameters.
GPT-2 (2019): Showed that scaling up (1.5 billion parameters) dramatically improved quality. OpenAI initially withheld the full model over concerns about misuse.
GPT-3 (2020): A massive leap to 175 billion parameters. Demonstrated few-shot learning — the ability to perform tasks from just a few examples in the prompt, without fine-tuning.
GPT-4 (2023): Multimodal (text + images), significantly improved reasoning, fewer hallucinations. Exact parameter count undisclosed.
GPT-4o (2024): Optimised version with faster inference and native multimodal capabilities. Retired February 2026.
GPT-5.4 (2026): Current flagship. Million-token context, advanced reasoning, native multimodal. GPT-5.3 serves as the fast, efficient tier.

Why GPT matters

GPT demonstrated three crucial insights:

Scaling works: Larger models trained on more data produce dramatically better results
Pre-training is powerful: A model trained to predict the next word learns an enormous amount about language, facts, and reasoning
Few-shot learning is practical: Large models can perform new tasks from just a few examples in the prompt, without any additional training

GPT vs the competition

GPT is not the only large language model family:

Claude (Anthropic): Emphasises safety and nuance
Gemini (Google): Native multimodal design
Llama (Meta): Open-source, enabling local deployment
Mistral (Mistral AI): European open-source models

Each family makes different trade-offs between capability, safety, speed, and openness.

GPT as a platform

OpenAI has built GPT into a platform — the API enables developers to integrate GPT into any application, Custom GPTs allow non-technical users to create purpose-built assistants, and the ChatGPT interface provides direct consumer and business access. This platform approach has made GPT one of the most widely deployed AI technologies in the world.

Want to go deeper?

This topic is covered in our Foundations level. Access all 100+ lessons free.

Why This Matters

GPT is the architecture that sparked the AI revolution in mainstream consciousness. Understanding GPT — what it is, how it evolved, and how it compares to alternatives — gives you the foundation to evaluate AI tools, understand industry developments, and make informed decisions about which models to use for your work.

Related Terms

Large Language Model (LLM)

A type of AI trained on vast amounts of text to understand and generate human language. ChatGPT, Claude, and Gemini are all LLMs.

Transformer

The neural network architecture behind modern AI assistants like ChatGPT and Claude. Introduced in 2017, it processes all words simultaneously using an attention mechanism.

ChatGPT

An AI chatbot built by OpenAI that uses large language models to engage in conversation, answer questions, write content, and assist with a wide range of tasks.

Generative AI

AI that creates new content — text, images, code, audio, video — rather than just analysing or classifying existing data.

Token

The smallest unit of text an AI model processes. Roughly 3-4 characters or three-quarters of a word. AI pricing is typically measured in tokens.

Learn More

Continue learning in Foundations

This topic is covered in our lesson: How Large Language Models Actually Work

← Back to Glossary