University of Oxford · Harvard University · Kempner Institute
Controlling generative models is computationally expensive. Optimal alignment with a reward function requires estimating the value function, which demands access to the conditional distribution of data given a noisy sample $p_{1|t}(x_1|x_t)$—typically requiring costly trajectory simulations. We introduce Meta Flow Maps (MFMs), extending consistency models and flow maps into the stochastic regime. MFMs perform one-step posterior sampling, generating i.i.d. draws of clean data from any intermediate state with a differentiable reparametrization for efficient value function estimation. This enables inference-time steering without rollouts and unbiased off-policy fine-tuning. Our steered-MFM sampler outperforms Best-of-1000 on ImageNet at a fraction of the compute.
MFM steering achieves higher rewards with >100× fewer function evaluations than Best-of-N.
Modern generative models like diffusion and flow-based models produce stunning samples, but controlling them remains a core challenge. Whether we want to steer generation toward high-reward outputs at inference time, or permanently fine-tune a model to align with human preferences, we face the same fundamental bottleneck: estimating the value function.
The value function tells us how good a noisy intermediate state is—but computing it requires sampling from the conditional posterior: the distribution of all possible clean outputs consistent with that noisy state. Existing methods either approximate this posterior crudely (introducing bias) or simulate expensive trajectories (killing efficiency).
Meta Flow Maps solve this dilemma by learning to sample the full posterior in a single forward pass, enabling both efficient steering and unbiased fine-tuning.
An MFM conditions on an intermediate time–state pair $(t, x)$ and learns a shared conditional flow that maps base noise $\varepsilon$ to endpoint samples $x_1$ from the posterior $p_{1|t}(\cdot|x)$. Varying the initial noise yields multiple i.i.d. samples from the same posterior.
A function $\Phi(\varepsilon; c)$ that maps base noise $\varepsilon$ (gray squares on the left) to samples from a target distribution $p_c$, indexed by a context $c$:
The context $c$ "selects" which distribution to sample from.
Here the context is $(t, x)$—a time and noisy image (e.g., the blurry dog at $(t_a, x_a)$). The MFM $X_{0,1}(\varepsilon; t, x)$ maps noise to clean images consistent with that noisy state:
The clean dogs on the right are i.i.d. samples from $p_{1|t_a}(\cdot|x_a)$—different noise $\varepsilon, \varepsilon'$ yields different valid reconstructions.
Key insight: The stochastic interpolant defines an infinite family of conditional posteriors $p_{1|t}(\cdot|x)$—one for each time–state pair $(t,x)$ drawn from the law of the interpolant itself. Each posterior has a corresponding ODE that transports noise to it, and each such ODE has a flow map compressing its trajectories to a single step. A Meta Flow Map learns to select from this infinite collection: given context $(t_a, x_a)$ (noisy dog), it picks the flow map for the dog posterior; given $(t_b, x_b)$ (noisy flower), it picks the flower posterior—all via the same learned network $X_{0,1}$. Because the MFM is a differentiable function along the law of interpolant, we can exploit its differentiability in the estimation of the value function gradient.
MFMs achieve competitive sample quality while enabling efficient reward alignment on ImageNet 256×256.
@article{potaptchik2025metaflowmaps,
title={Meta Flow Maps enable scalable reward alignment},
author={Potaptchik, Peter and Saravanan, Adhi and
Mammadov, Abbas and Prat, Alvaro and
Albergo, Michael S. and Teh, Yee Whye},
journal={arXiv preprint},
year={2025}
}