Core AI

Diffusion Model

Last reviewed: April 2026

A type of generative AI that creates images, audio, or video by learning to gradually remove noise from random static, producing remarkably realistic outputs.

A diffusion model is a generative AI system that creates new data — typically images — by learning to reverse a noise-adding process. It starts with pure random noise and gradually refines it into a coherent output. DALL-E, Midjourney, and Stable Diffusion are all built on diffusion models.

How diffusion works

The training process has two phases:

Forward process (adding noise) — take a real image and gradually add random noise over many steps until it becomes pure static. This is automatic and requires no learning.
Reverse process (removing noise) — train a neural network to reverse each step, predicting and removing the noise that was added. This is where the learning happens.

Once trained, the model can start from pure random noise and iteratively denoise it into a realistic image. The genius is that by conditioning this process on a text description, the model generates images that match the prompt.

Why diffusion models won

Earlier generative approaches like GANs (generative adversarial networks) produced impressive results but were notoriously difficult to train — prone to mode collapse and training instability. Diffusion models are more stable, produce higher-quality outputs, and offer better control over the generation process.

Key concepts

Denoising steps — more steps produce higher-quality images but take longer to generate
Guidance scale — controls how strongly the output adheres to the text prompt (higher values are more faithful but less creative)
Latent diffusion — performing the diffusion process in a compressed latent space rather than pixel space, dramatically reducing computational cost. This is what makes Stable Diffusion practical.
ControlNet — additional conditioning that lets you guide generation with sketches, depth maps, or poses

Beyond images

Diffusion models now generate audio (music, speech), video, 3D models, and even molecular structures for drug discovery. The core principle — learn to denoise — applies across data types.

Want to go deeper?

This topic is covered in our Advanced level. Access all 100+ lessons free.

Why This Matters

Diffusion models power the image and video generation tools transforming creative industries, marketing, and product design. Understanding how they work helps you use these tools more effectively, set realistic expectations for output quality, and make informed decisions about AI-generated content policies.

Related Terms

Generative AI

AI that creates new content — text, images, code, audio, video — rather than just analysing or classifying existing data.

Deep Learning

A subset of machine learning that uses neural networks with many layers to learn complex patterns. The 'deep' refers to the number of layers, not the depth of understanding.

Neural Network

A computing system loosely inspired by the human brain, made of layers of interconnected nodes that learn to recognise patterns in data.

GAN (Generative Adversarial Network)

A generative AI architecture where two neural networks compete — one generates fake data, the other tries to detect it — pushing both to improve until the fakes are indistinguishable from real data.

Related Comparisons

Midjourney vs DALL-E 3

Midjourney and DALL-E 3 compared for image quality, prompt control, style, pricing, and practical use cases. Find the best AI image generator for your needs.

Learn More

Continue learning in Advanced

This topic is covered in our lesson: How LLMs Actually Work

← Back to Glossary