Scaling Inference Compute for Denoising Diffusion Probabilistic Models

Niel Ok, Hemal Arora, Siya Goel
Stanford University · March 2025
Code

TL;DR: We present a novel framework for inference scaling in stochastic diffusion models beyond time steps, extending prior work from deterministic (ODE-based) noise space search to the stochastic setting of DDPMs. While prior efforts by DeepMind have framed inference-time optimization as a search over initial noise vectors in deterministic reverse processes, we generalize this to stochastic processes by introducing trajectory space search. This reframing treats the entire sequence of reverse-time noise injections as a structured object of optimization, enabling targeted exploration of the denoising landscape. We additionally develop a test-time conditioning mechanism via verifier-guided selection over sampled trajectories using a small labeled subset, enabling effective output steering without retraining. Our implementation successfully demonstrates inference-time control on MNIST with minimal supervision, despite limited compute resources. We suspect that trajectory-aware compute allocation—non-uniform distribution of inference effort across steps—can further enhance performance and generalization, and will investigate this in future work. Also, since we have generalized inference scaling of diffusion models in the stochastic domain, we intend to expand our work to domains that utilize stochastic diffusion models, like robotics and complex image and video generation.

The future_generalization_experiment directory contains a prototype to evaluate the discriminative power of the verifier across domains.

To the best of our knowledge, this is the first empirical and theoretical instantiation of trajectory space inference scaling for stochastic diffusion models.

We gratefully acknowledge compute support provided by researchers at DeepMind.

Addendum: Trajectory Space Search Theory (from Niel's Blog)

May 2, 2025
Trajectory Space Search Theory

This addendum formalizes trajectory space search theory in stochastic diffusion models. The key idea is to shift the search domain from noise space to trajectory space as we move from deterministic diffusers (under DeepMind's framework) to stochastic ones, potentially enabling more expressive inference.

1. Definitions

2. DeepMind's Framing: Noise Space Search

Optimize over \( x_T \) to maximize final sample quality via the diffusion model:

\[ x_T^\ast = \arg\max_{x_T} f(\pi(x_T)) \]

But this assumes a deterministic or injective mapping from \( x_T \to x_0 \), which fails in stochastic denoising models.

3. Proposed Framing: Trajectory Space Search

Instead, treat the full denoising trajectory \( \pi \) as the search object:

\[ \pi^\ast = \arg\max_{\pi \in \mathcal{T}} f(\pi) \]

4. Degrees of Freedom

Let each transition step \( x_{t-1} \sim p_\theta(x_{t-1} \mid x_t, \epsilon_t) \) include a noise term \( \epsilon_t \). Then define the trajectory control space:

\[ \Pi = \{ \epsilon_T, \epsilon_{T-1}, \ldots, \epsilon_1 \} \]

5. Expressivity Through Trajectory Search

In stochastic diffusion models, the outcome distribution is already high-variance due to noise injected at each step. But variance alone doesn’t imply structure or direction. Trajectory space search reframes generation not as random sampling, but as a search problem over entire denoising paths.

Let \( f(\pi) \) be a task-aligned score of a trajectory \( \pi \), such as realism, reward, or semantic alignment. Define the range of achievable scores when searching over full noise paths \( \Pi = \{\epsilon_T, \ldots, \epsilon_1\} \) as:

\[ \mathcal{R}(f, \Pi) := \sup_{\pi \in \Pi} f(\pi) - \inf_{\pi \in \Pi} f(\pi) \]

Compare this to the narrower set of outputs reachable when only varying the initial noise \( x_T \), and letting the reverse process run stochastically:

\[ \mathcal{R}(f, \Pi) \gg \mathcal{R}(f, \mathcal{T}(x_T)) \]

In other words, trajectory space offers a broader and more navigable landscape of possible generations. While both methods sample from the same underlying stochastic process, trajectory search treats inference as a deliberate exploration of denoising paths rather than a one-shot roll from a prior.

Interpretation

This theory reframes generative inference in diffusion models as a structured search over stochastic trajectories. Since randomness accumulates at every step, the final output depends not just on the starting point \( x_T \), but on the full sequence of noise injections that guide the process. By optimizing over these noise trajectories, we expose a richer set of candidate generations, enabling more expressive and reliable sampling without changing the model itself. This trades off increased compute for better inference, not through control, but through deeper search.

BibTeX

@misc{ok2025trajectory,
  title        = {Scaling Inference Compute for Denoising Diffusion Probabilistic Models},
  author       = {Niel Ok and Hemal Arora and Siya Goel},
  year         = {2025},
  month        = {March},
  institution  = {Stanford University},
  url          = {https://nielok.github.io/diffusion_test_time_compute/}
}