Generative AI · Image Synthesis
The Art of
Controlled Noise
Diffusion models learn to reverse a gradual noising process — recovering structure from chaos one denoising step at a time. The result: strikingly photorealistic, endlessly creative images from plain text.
Overview
Noise in, image out
A diffusion model is trained by adding Gaussian noise to images in thousands of tiny steps, then learning a neural network that predicts and removes that noise. At inference, sampling begins from pure noise and iterates backward through the learned denoising path — sculpting coherent structure from randomness.
Notable Models
Two landmark architectures
Stable Diffusion
Runs the diffusion process in a compressed latent space rather than pixel space, making high-resolution synthesis feasible on consumer hardware. Open-weights and community-driven.
Latent DiffusionDALL·E
OpenAI’s text-to-image lineage, combining CLIP-based text understanding with iterative diffusion refinement. Optimised for instruction-following and photorealism at scale.
Guided DiffusionProcess
Forward & reverse diffusion
Forward Process — Add Noise
A clean image is progressively corrupted by adding small amounts of Gaussian noise over T steps until the signal is entirely destroyed.
Train a Denoising Network
A U-Net (or transformer) learns to predict the noise added at each timestep, conditioned on the timestep embedding and optional text/class guidance.
Text Conditioning via CLIP
A text prompt is encoded by a language model and injected into the denoiser via cross-attention, steering the denoising trajectory toward semantically matching images.
Reverse Sampling
Starting from pure Gaussian noise, the model iteratively removes predicted noise — DDPM, DDIM, DPM-Solver — converging on a coherent image in tens of steps.
Decode to Pixels
Latent-diffusion models pass the denoised latent through a VAE decoder to produce a full-resolution pixel image ready for use.

