This addendum formalizes trajectory space search theory in stochastic diffusion models. The key idea is to shift the search domain from noise space to trajectory space as we move from deterministic diffusers (under DeepMind's framework) to stochastic ones, potentially enabling more expressive inference.
1. Definitions
- \( \mathcal{X} \): data space (e.g., images, text sequences)
- \( x_T \sim p_T(x) \): initial noise drawn from a base Gaussian distribution
- \( x_t \): sample at diffusion step \( t \) (from \( T \) to 0)
- \( \pi = \{x_T, x_{T-1}, \ldots, x_0\} \): a full denoising trajectory
- \( \pi(x_T) \): a trajectory sampled from the default reverse process starting at \( x_T \)
- \( \mathcal{T}(x_T) \): set of all possible trajectories starting at \( x_T \)
- \( f(\pi) \): objective function scoring the final output \( x_0 \)
2. DeepMind's Framing: Noise Space Search
Optimize over \( x_T \) to maximize final sample quality via the diffusion model:
\[ x_T^\ast = \arg\max_{x_T} f(\pi(x_T)) \]
But this assumes a deterministic or injective mapping from \( x_T \to x_0 \), which fails in stochastic denoising models.
3. Proposed Framing: Trajectory Space Search
Instead, treat the full denoising trajectory \( \pi \) as the search object:
\[ \pi^\ast = \arg\max_{\pi \in \mathcal{T}} f(\pi) \]
4. Degrees of Freedom
Let each transition step \( x_{t-1} \sim p_\theta(x_{t-1} \mid x_t, \epsilon_t) \) include a noise term \( \epsilon_t \). Then define the trajectory control space:
\[ \Pi = \{ \epsilon_T, \epsilon_{T-1}, \ldots, \epsilon_1 \} \]
5. Expressivity Through Trajectory Search
In stochastic diffusion models, the outcome distribution is already high-variance due to noise injected at each step. But variance alone doesn’t imply structure or direction. Trajectory space search reframes generation not as random sampling, but as a search problem over entire denoising paths.
Let \( f(\pi) \) be a task-aligned score of a trajectory \( \pi \), such as realism, reward, or semantic alignment. Define the range of achievable scores when searching over full noise paths \( \Pi = \{\epsilon_T, \ldots, \epsilon_1\} \) as:
Compare this to the narrower set of outputs reachable when only varying the initial noise \( x_T \), and letting the reverse process run stochastically:
In other words, trajectory space offers a broader and more navigable landscape of possible generations. While both methods sample from the same underlying stochastic process, trajectory search treats inference as a deliberate exploration of denoising paths rather than a one-shot roll from a prior.
Interpretation
This theory reframes generative inference in diffusion models as a structured search over stochastic trajectories. Since randomness accumulates at every step, the final output depends not just on the starting point \( x_T \), but on the full sequence of noise injections that guide the process. By optimizing over these noise trajectories, we expose a richer set of candidate generations, enabling more expressive and reliable sampling without changing the model itself. This trades off increased compute for better inference, not through control, but through deeper search.