GPT (Generative Pre-trained Transformer)
A family of large language models created by OpenAI that generate human-like text by predicting the next word, forming the foundation of ChatGPT.
GPT stands for Generative Pre-trained Transformer β a family of large language models created by OpenAI. GPT models generate text by predicting the next token in a sequence, trained on vast amounts of text data from the internet. The GPT architecture is the foundation of ChatGPT and one of the most influential developments in AI history.
Breaking down the name
- Generative: The model generates new text rather than just classifying or extracting information
- Pre-trained: The model is first trained on a massive general dataset (the "pre-training" phase) before being adapted for specific tasks
- Transformer: The underlying neural network architecture that enables the model to process text efficiently using self-attention mechanisms
The evolution of GPT
- GPT-1 (2018): Demonstrated that pre-training on large text corpora followed by fine-tuning could produce strong language understanding. 117 million parameters.
- GPT-2 (2019): Showed that scaling up (1.5 billion parameters) dramatically improved quality. OpenAI initially withheld the full model over concerns about misuse.
- GPT-3 (2020): A massive leap to 175 billion parameters. Demonstrated few-shot learning β the ability to perform tasks from just a few examples in the prompt, without fine-tuning.
- GPT-4 (2023): Multimodal (text + images), significantly improved reasoning, fewer hallucinations. Exact parameter count undisclosed.
- GPT-4o (2024): Optimised version with faster inference and native multimodal capabilities.
- o1/o3 (2024-2025): Models with explicit reasoning chains for complex problem-solving.
Why GPT matters
GPT demonstrated three crucial insights:
- Scaling works: Larger models trained on more data produce dramatically better results
- Pre-training is powerful: A model trained to predict the next word learns an enormous amount about language, facts, and reasoning
- Few-shot learning is practical: Large models can perform new tasks from just a few examples in the prompt, without any additional training
GPT vs the competition
GPT is not the only large language model family:
- Claude (Anthropic): Emphasises safety and nuance
- Gemini (Google): Native multimodal design
- Llama (Meta): Open-source, enabling local deployment
- Mistral (Mistral AI): European open-source models
Each family makes different trade-offs between capability, safety, speed, and openness.
GPT as a platform
OpenAI has built GPT into a platform β the API enables developers to integrate GPT into any application, Custom GPTs allow non-technical users to create purpose-built assistants, and the ChatGPT interface provides direct consumer and business access. This platform approach has made GPT one of the most widely deployed AI technologies in the world.
Why This Matters
GPT is the architecture that sparked the AI revolution in mainstream consciousness. Understanding GPT β what it is, how it evolved, and how it compares to alternatives β gives you the foundation to evaluate AI tools, understand industry developments, and make informed decisions about which models to use for your work.
Related Terms
Continue learning in Foundations
This topic is covered in our lesson: How Large Language Models Actually Work