Generative AI is a branch of artificial intelligence whose purpose is to learn the underlying distribution of a dataset (e.g., landscape images, portraits, oil paintings, etc.) and, based on that learning, generate new samples belonging to the same distribution but unseen during training. In other words: create something new that looks real.
Unlike discriminative models — which learn to distinguish between classes (e.g., dog vs. cat) — generative models learn to reconstruct or simulate reality based on statistical patterns. This capability makes them especially useful in contexts where data is scarce, expensive to obtain, or simply nonexistent, enabling the synthesis of artificial examples that enrich the creative or training process.
Over the past decade, several families of generative models have emerged, each with its own strengths, weaknesses, and preferred application areas. Understanding this landscape helps contextualize why diffusion models have become so popular.
Autoregressive models generate data sequentially, predicting the next element (pixel, token, word) based on previous elements. Classic examples include PixelRNN and PixelCNN for images, and GPT for text.
Introduced in 2014 by Ian Goodfellow, GANs consist of two neural networks competing against each other: a generator that creates fake samples, and a discriminator that tries to distinguish between real and fake samples. The generator iteratively improves until the discriminator can no longer differentiate them.
These models use invertible and differentiable transformations to map input data to a simple distribution (e.g., Gaussian) and vice versa. Examples: RealNVP, Glow.
Diffusion models, popularized since 2020, are based on a thermodynamics-inspired process: Gaussian noise is gradually added to data until it becomes indistinguishable from pure Gaussian noise, and then a neural network is trained to reverse this process, step by step removing noise to reconstruct a coherent image.