2026-01-12

Manifold Hypothesis

Consider the CIFAR-10 dataset
Each image is of the dimension $32 \times 32 \times 3$
We can consider this dataset as a single vector of dimension $[0, 255]^{3072}$
If we were to draw uniformly from this, we would never accidentally stumble upon a sample from CIFAR-10
Why?
- One reason is that the pixels aren’t completely unrelated as we are drawing them
- If a pixel is $0, 0, 0$ then its neighbors are most likely $0, 0, 0$ or very close that white shade
- It is highly unlikely that we can replicate this dependence by uniformly sampling each pixel independently
The Manifold Hypothesis, in layman’s terms, is that this higher dimensional data lives in a lower dimensional latent space of some kind