Skip to main content
Early access β€” new tools and guides added regularly
Core AI

Diffusion Model

Last reviewed: April 2026

A type of generative AI that creates images, audio, or video by learning to gradually remove noise from random static, producing remarkably realistic outputs.

A diffusion model is a generative AI system that creates new data β€” typically images β€” by learning to reverse a noise-adding process. It starts with pure random noise and gradually refines it into a coherent output. DALL-E, Midjourney, and Stable Diffusion are all built on diffusion models.

How diffusion works

The training process has two phases:

  1. Forward process (adding noise) β€” take a real image and gradually add random noise over many steps until it becomes pure static. This is automatic and requires no learning.
  2. Reverse process (removing noise) β€” train a neural network to reverse each step, predicting and removing the noise that was added. This is where the learning happens.

Once trained, the model can start from pure random noise and iteratively denoise it into a realistic image. The genius is that by conditioning this process on a text description, the model generates images that match the prompt.

Why diffusion models won

Earlier generative approaches like GANs (generative adversarial networks) produced impressive results but were notoriously difficult to train β€” prone to mode collapse and training instability. Diffusion models are more stable, produce higher-quality outputs, and offer better control over the generation process.

Key concepts

  • Denoising steps β€” more steps produce higher-quality images but take longer to generate
  • Guidance scale β€” controls how strongly the output adheres to the text prompt (higher values are more faithful but less creative)
  • Latent diffusion β€” performing the diffusion process in a compressed latent space rather than pixel space, dramatically reducing computational cost. This is what makes Stable Diffusion practical.
  • ControlNet β€” additional conditioning that lets you guide generation with sketches, depth maps, or poses

Beyond images

Diffusion models now generate audio (music, speech), video, 3D models, and even molecular structures for drug discovery. The core principle β€” learn to denoise β€” applies across data types.

Want to go deeper?
This topic is covered in our Advanced level. Access all 60+ lessons free.

Why This Matters

Diffusion models power the image and video generation tools transforming creative industries, marketing, and product design. Understanding how they work helps you use these tools more effectively, set realistic expectations for output quality, and make informed decisions about AI-generated content policies.

Related Terms

Learn More

Continue learning in Advanced

This topic is covered in our lesson: How LLMs Actually Work