Skip to main content
Early access β€” new tools and guides added regularly
Core AI

GPT (Generative Pre-trained Transformer)

Last reviewed: April 2026

A family of large language models created by OpenAI that generate human-like text by predicting the next word, forming the foundation of ChatGPT.

GPT stands for Generative Pre-trained Transformer β€” a family of large language models created by OpenAI. GPT models generate text by predicting the next token in a sequence, trained on vast amounts of text data from the internet. The GPT architecture is the foundation of ChatGPT and one of the most influential developments in AI history.

Breaking down the name

  • Generative: The model generates new text rather than just classifying or extracting information
  • Pre-trained: The model is first trained on a massive general dataset (the "pre-training" phase) before being adapted for specific tasks
  • Transformer: The underlying neural network architecture that enables the model to process text efficiently using self-attention mechanisms

The evolution of GPT

  • GPT-1 (2018): Demonstrated that pre-training on large text corpora followed by fine-tuning could produce strong language understanding. 117 million parameters.
  • GPT-2 (2019): Showed that scaling up (1.5 billion parameters) dramatically improved quality. OpenAI initially withheld the full model over concerns about misuse.
  • GPT-3 (2020): A massive leap to 175 billion parameters. Demonstrated few-shot learning β€” the ability to perform tasks from just a few examples in the prompt, without fine-tuning.
  • GPT-4 (2023): Multimodal (text + images), significantly improved reasoning, fewer hallucinations. Exact parameter count undisclosed.
  • GPT-4o (2024): Optimised version with faster inference and native multimodal capabilities.
  • o1/o3 (2024-2025): Models with explicit reasoning chains for complex problem-solving.

Why GPT matters

GPT demonstrated three crucial insights:

  1. Scaling works: Larger models trained on more data produce dramatically better results
  2. Pre-training is powerful: A model trained to predict the next word learns an enormous amount about language, facts, and reasoning
  3. Few-shot learning is practical: Large models can perform new tasks from just a few examples in the prompt, without any additional training

GPT vs the competition

GPT is not the only large language model family:

  • Claude (Anthropic): Emphasises safety and nuance
  • Gemini (Google): Native multimodal design
  • Llama (Meta): Open-source, enabling local deployment
  • Mistral (Mistral AI): European open-source models

Each family makes different trade-offs between capability, safety, speed, and openness.

GPT as a platform

OpenAI has built GPT into a platform β€” the API enables developers to integrate GPT into any application, Custom GPTs allow non-technical users to create purpose-built assistants, and the ChatGPT interface provides direct consumer and business access. This platform approach has made GPT one of the most widely deployed AI technologies in the world.

Want to go deeper?
This topic is covered in our Foundations level. Access all 60+ lessons free.

Why This Matters

GPT is the architecture that sparked the AI revolution in mainstream consciousness. Understanding GPT β€” what it is, how it evolved, and how it compares to alternatives β€” gives you the foundation to evaluate AI tools, understand industry developments, and make informed decisions about which models to use for your work.

Related Terms

Learn More

Continue learning in Foundations

This topic is covered in our lesson: How Large Language Models Actually Work