Rectified flow superimposed visualization requires rectified flow data with at least 2 steps.

Introduction

Flow-based generative models [?] have emerged as a powerful class of models for generating high-quality samples of complex data such as images and videos. These models leverage neural networks to transform random noise into complex data by applying a sequence of invertible transformations, allowing for both novel sample generation and likelihood estimation. The success of flow models is in part due to the introduction of flow matching [?] , which enables training without computationally expensive simulation and allows the use of arbitrary noise distributions. However, a practical barrier to deploying flow models at scale is the need to run large neural networks—often with billions of parameters—many times to generate high-quality samples. This incurs not just high computational cost but also high latency; in some cases it can take minutes to generate a single sample. Thus, there is a pressing need to develop methods for accelerating flow-based models that minimize the number of necessary neural network passes.

A major culprit behind the high cost incurred when sampling from flow models stems from the geometric properties of the learned flows. It can be challenging to reason about high-dimensional data, but fortunately for us, we can gain an intuition about many of the important geometric properties of flows by visualizing them in low-dimensions. In fact, we can use the exact same algorithms used to train large-scale models to train simple 2D flows on toy distributions and reproduce many phenomena of practical interest. In a related project we developed an interactive web app called Diffusion Explorer [?] that allows users to experiment with training and sampling from flow and diffusion models in 2D.

Sampling from a flow model involves simulating the trajectory of an abstract particle as it moves from random noise to real data by repeatedly querying a neural network to determine the particle's velocity at each point in time. When these trajectories are highly curved, accurately simulating them requires taking many small steps of our expensive neural network. Shown in Figure 2 above, we can see that a flow model trained to generate samples from a simple smiley face distribution produces trajectories that are curved. This curvature, its consequences, and how to mitigate them are the central focus of this article. We will discuss why trajectories generated by flow models have this geometry, why they are challenging to efficiently simulate, and how a simple approach called rectified flows [?] can straighten out the trajectories of flow models to enable faster sampling.

Background

Before diving into the details behind why models trained with flow matching produce curved trajectories and how rectified flows can help, we will first cover some necessary background on flow-based generative models and flow matching. Separately, a great introduction to this topic by some of the original authors of flow matching can be found here [?] . If you already have some familiarity with flow-based generative models and flow matching feel free to skip ahead to The Problem.

Flow-Based Generative Models

The broad goal of generative modeling is to draw samples from some complex distribution of data (e.g., natural images) that we have empirical observations from, but where the true distribution is unknown. More concretely, given a finite number of samples $\mathcal{X} = \{x_1, \dots, x_n\}$ from a target distribution $q$ , our goal is to learn a model that can generate new samples from $q$ .

A flow model learns to bridge a simple source probability distribution $p$ that is easy to draw samples from, like a multivariate Gaussian $\mathcal{N}(0, \sigma^2 I)$ , to a complex data distribution $q$ by defining a continuous transformation between the two. We define a continuous sequence of probability distributions, called a probability path $(p_t)_{0 \leq t \leq 1}$ , that smoothly interpolates between our simple source distribution $p_0$ and our data distribution $p_1 = q$ (see Figure 3). We index this path by a time variable $t \in [0, 1]$ , where $t=0$ corresponds to the source distribution and $t=1$ corresponds to the target distribution. By drawing samples from $p_0$ and transforming them over time we can produce samples distributed according to our data distribution $p_1 = q$ .

A flow $\psi_t(x)$ is a time-indexed mapping from $\mathbb{R}^d$ to $\mathbb{R}^d$ that specifies trajectories of points over time; when applied to our samples $X_0 \sim p_0$ it transports them from the source distribution to the target distribution $X_1 \sim p_1 = q$ . The objective of flow based models is to learn a flow, such that for each time $t \in [0, 1]$ , the points transformed by the flow $X_t = \psi_t(X_0)$ are distributed according to the corresponding distribution in our probability path $X_t \sim p_t$ . If we can somehow learn to model this flow, then we can draw samples from our simple source distribution $p_0$ and transform them to realistic approximations of real world data with distribution $q$ .

Perhaps somewhat counterintuitively, rather than directly modeling the flow $\psi_t(x)$ , flow-based generative models instead model a time-dependent velocity field $v_t(x)$ that "generates" the flow. By taking this velocity field we can solve a set of ordinary differential equations (ODEs) to recover the flow, in a process called simulation. By starting from some initial point $x$ at time $t=0$ , we can trace the trajectory of this point over time according to the velocity field $v_t(x)$ using the following ODEs

\frac{d}{dt} \psi_t(x) = v_t(x), \quad \psi_0(x) = x.

The solution to these ordinary differential equations involving $v_t(x)$ is itself the flow $\psi_t(x)$ . There are a variety of numerical methods for simulating these ODEs which approximate the continuous trajectory by taking a series of discrete steps. Perhaps the simplest such method is Euler's method, which approximates the trajectory of the flow by taking small linear steps in the direction of the velocity field at each time step $x_{t + \Delta t} = x_t + \Delta t \cdot v_t(x_t)$ .

Flow Matching

Now that we are equipped with some background knowledge on flow-based generative models, we can discuss flow matching. I will only give a high level overview of some of the concepts relevant to rectified flows. Please check out [?] for a more thorough introduction.

The motivation behind flow matching is to be able to learn our vector field $v_t(x)$ without having to do expensive simulation, meaning without having to use Euler integration or some other technique to solve ODEs. Flow matching allows us to learn $v_t(x)$ by solving a simple regression loss!

Flow matching can be broken down into two key steps:

We need to define our probability path $p_t(x)$ for interpolating between our source $p$ and target distribution $q$ .
We need to train a velocity field $v_t^\theta(x)$ that generates the path $p_t$ through regression.

Step 1: Defining the Probability Path. We will focus on a specific choice of probability path called the linear path. The linear path can be defined through a simple linear interpolation between our source and target distributions:

X_t = (1-t)X_0 + tX_1 \sim p_t

In the examples I provide throughout this article, our source distribution $p_0$ is always a standard Gaussian distribution $p_0(x) = \mathcal{N}(x|0, I)$ , and our target distribution $q$ is a complex 2D distribution representing a smiley face. However, in general, flow matching affords much more flexibility in the choice of probability paths and source distributions.

Step 2: Regressing the Velocity Field. Now, the second step of flow matching is to "match" the true velocity field $v_t(x_t)$ with an approximation $v_t^\theta(x_t)$ , parameterized by a neural network, by optimizing a simple regression objective.

\mathcal{L}_{FM}(\theta) = \mathbb{E}_{t, X_t \sim p_t} ||v_t(X_t) - v_t^\theta(X_t)||^2

However, there is a catch: we do not have direct access to the true velocity field $v_t(x_t)$ ! $v_t(x_t)$ is difficult to directly construct in practice as it governs the transformations between two jointly distributed high dimensional distributions. So, how can we optimize this objective?

Luckily, we can create a related but much simpler objective by conditioning our velocity field on a particular instance from our target distribution $x_1 \sim q$ . This yields the conditional velocity field $v_t(x_t | x_1) = \frac{x_1 - x_t}{1 - t}$ .

Equipped with this conditional vector field, we can create a regression objective called conditional flow matching.

\mathcal{L}_{CFM}(\theta) = \mathbb{E}_{t, X_0, X_1} ||v_t(X_t | X_1) - v_t^\theta(X_t)||^2

If we then plug in our specific conditional velocity field for our choice of a linear probability path, we get the remarkably simple training objective:

\mathcal{L}_{CFM}(\theta) = \mathbb{E}_{t, X_0, X_1} ||(X_1 - X_0) - v_t^\theta(X_t)||^2

Incredibly, the conditional flow matching and the flow matching objectives have the same gradients $\nabla_\theta \mathcal{L}_{CFM}(\theta) = \nabla_\theta \mathcal{L}_{FM}(\theta)$ , meaning we can optimize our tractable conditional flow matching objective and solve the flow matching problem. During training we simply need to draw pairs $(x_0, x_1)$ from our source and target distributions, interpolate between them to get $x_t$ , and then train our velocity field $v_t^\theta(x_t)$ to predict the straight-line velocity $x_1 - x_0$ .

A critical fact that is worth emphasizing, is that we are matching the conditional velocity $v_t(x_t|x_1)$ which is conditioned on the target point $x_1$ with our learned velocity field $v_t^\theta(x_t)$ which only "knows" about the current $x_t$ . If we were to condition our learned vector field on $x_1$ as well, then the problem would become trivial as the model could just predict some scaled version of $x_1 - x_t$ . So, the model $v_t^\theta(x_t)$ has to identify the likely destination $x_1$ using only the information about the location $x_t$ at time $t$ .

The Problem

With the fundamentals of flow models and flow matching established, we can now investigate some of their idiosyncrasies—and how they come up in practice. We showed above that the trajectories produced by a flow model trained with flow matching are curved (see Figure 2). To further illustrate this point, if we superimpose the source and target distributions we can see that this curvature is even more extreme (see Figure 9).

Loading curved trajectory visualization...

An astute reader might recall that we trained our velocity field $v_t^\theta(x)$ to match straight trajectories $X_1 - X_0$ due to our choice of a linear path. So why does our model then learn curved trajectories, and why is this an issue? Answering the latter question—why curvature is a problem—is more straightforward: the answer is speed.

Curvature is the Enemy of Speed

When drawing new samples from a flow model we perform numerical integration using the trained velocity field $v_t^\theta(x_t)$ . At their core, numerical integration algorithms like Euler's method involve making finite steps in the direction of the velocity field: $x_{t + \Delta t} = x_t + \Delta t \cdot v_t^\theta(x_t)$ . We are making local linear approximations of the "true trajectories". The degree to which this approximation is accurate depends on how curved the trajectories are, and the size of steps we can take without deviating from the true trajectory, degrading sample quality.

The punch line: curvature is the enemy of speed. Highly curved trajectories are challenging to accurately simulate with a small number of steps. This means we need to make many calls to our large neural network representing our vector field $v_t^\theta(x)$ in order to accurately approximate these trajectories, leading to high latency and computational cost. But why does our model learn these curved trajectories in the first place? The answer has to do with how our source and target random variables are jointly distributed, a concept called a coupling.

What is a Coupling?

When training our velocity field $v_t^\theta(x)$ with flow matching, we need to draw pairs $(x_0, x_1)$ from our source and target distributions $p_0$ and $q$ . Something that we glossed over a bit in the section about Flow Matching is how exactly we should draw these pairs. This is actually a crucial design choice, called a coupling, that has a significant impact on the geometry of the learned flow, and is the key culprit behind our curved trajectories.

A coupling is the joint distribution $\pi(x_0, x_1)$ between our source and target random variables. This coupling dictates how our pairs $(x_0, x_1)$ used during training are distributed. The key requirement of a coupling is that the marginals are the source $\pi(x_0) = p$ and target distributions $\pi(x_1) = q$ .

The simplest form of coupling, and the one we investigate in this article, is an independent coupling (see Figure 11), where we independently draw $X_0 \sim p$ and $X_1 \sim q$ , and we have that $\pi(x_0, x_1) = \pi(x_0)\pi(x_1)$ . This allows us to trivially construct pairs $(x_0, x_1)$ during training, and is a natural choice in scenarios where we don't have any known structure associating pairs from our source and target distributions.

As mentioned above, our choice of independent coupling is the key culprit behind our curved trajectories. You can see in Figure 11 that the lines connecting independently drawn source and target points cross each other a lot. These intersections lead to curved trajectories because they introduce branches in our paths that our learned velocity field $v_t^\theta(x)$ can not resolve.

An alternative to the independent coupling is an optimal transport coupling (see Figure 12), which connects source and target points in a way that minimizes the overall cost of transporting mass from the source to the target distribution. This coupling tends to produce fewer crossing paths, which leads to straighter trajectories. However, optimal transport couplings are more challenging to compute, especially in high dimensions, and so they are less commonly used in practice.

Paths Crossed at the Wrong Time

Our learned flow model $v_t^\theta(x)$ is not capable of accurately modeling the crossing paths produced by our independent coupling; this incapability manifests itself in curved trajectories. More precisely, say two paths formed by the pairs $(x_0^a, x_1^a)$ and $(x_0^b, x_1^b)$ intersect at some point $x$ at time $t$ , or at least nearly intersect. This results in two distinct velocities $x_1^a - x_0^a$ and $x_1^b - x_0^b$ that our learned velocity field $v_t^\theta(x)$ is supposed to match at the same location $x$ and time $t$ . This is not possible because our learned velocity field $v_t^\theta(x)$ is only a function of the current location $x$ and time $t$ .

Our learned velocity field $v_t^\theta(x)$ cannot accurately predict both desired velocities at this intersection point, and so it ends up predicting the average of these two velocities. This is also true more generally, whenever we have many paths intersecting in a small neighborhood. Our learned velocity field averages out the conflicting velocities by taking the conditional expectation of velocities passing through this point: $\mathbb{E}[X_1 - X_0 | X_t = x]$ . Finally, because the average velocity at these intersection points can change as we move through space, we develop curved trajectories. So, despite the fact that we train our flow model to match straight-line velocities, we end up with curved trajectories.

Rectified Flows

We have discussed why curved trajectories are difficult to simulate with a small number of steps, and now we also understand why a flow model learns curved trajectories when using an independent coupling. Now we ask the question: how can we learn straighter trajectories? A solution to this problem is exactly what Rectified Flows provide us with, and given all of the context above it is actually a startlingly simple solution laying in plain sight.

The Algorithm

Rectified flows straighten out the trajectories of flows by replacing the naive independent coupling used in vanilla flow-matching training with one induced by the model itself. First, we train a model with flow matching using an independent coupling. Next, we generate new pairs $(X_0, X_1^1)$ by drawing $X_0 \sim p$ and applying our learned flow model to get $X_1^1 = \psi_1^1(X_0)$ . This new coupling $\pi_1 = (X_0, X_1^1)$ is then used to retrain a new flow model $v_\theta^2$ . By repeating this process multiple times we can progressively straighten out the trajectories of our flow model. The full procedure is outlined in the algorithm below.

Algorithm: Reflow Procedure

Inputs: Source distribution

p

, target distribution

q

, number of iterations

K

Outputs: Rectified velocity field

v_\theta^K

1: Sample pairs

(X_0, X_1)

from independent coupling

\pi_0 = p \times q

2: for

k = 1, 2, \ldots, K

3: Train velocity field

v_\theta^k

on pairs from

\pi_{k-1}

4: Generate new pairs:

X_1^k = \psi_1^k(X_0)

by flowing

X_0 \sim p

through

v_\theta^k

5: Update coupling:

\pi_k = (X_0, X_1^k)

6: end for

7: return

v_\theta^K

Algorithm 1: The Reflow procedure iteratively straightens trajectories by retraining on the coupling induced by the previous model.

Why it Works

We draw samples from our trained flow model by solving an ordinary differential equation of the form

\frac{d}{dt} \psi_t(x) = v_t(x), \text{ } \psi_0(x) = x.

This forms a deterministic flow, where it is guaranteed that trajectories $\psi_t(x)$ are unique (under some mild regularity conditions). This uniqueness property is crucial to understanding why rectified flows work. The uniqueness of trajectories in deterministic flows means that two distinct trajectories cannot intersect at the same point in space and time $(x,t)$ . If this did happen, then the two trajectories would have to coincide for all times, contradicting the assumption that they are distinct. Deterministic flows therefore forbid crossing, branching, or merging of trajectories. The deterministic nature of these flows is inherited by the coupling induced by integrating the flow.

When we generate new pairs $(X_0, X_1^k)$ by flowing samples from our source distribution through our learned flow model, we are guaranteed to get a coupling where trajectories do not intersect. By retraining on this coupling, we are effectively removing the conflicting velocities at intersection points that caused curvature in the first place.

Comparisons

We can also compare the trajectories learned by a standard flow matching model versus a rectified flow model (see Figure 15 ). The rectified flow model learns significantly straighter trajectories, which are easier to simulate with fewer steps.

This difference in curvature has a direct impact on how many steps are needed during sampling. We can observe this effect by comparing how well Euler's method approximates the "ground truth" trajectory (using many steps) with varying numbers of integration steps (see Figure 16 ). Notice how the rectified flow model produces accurate approximations even with very few steps, while the flow matching model's curved trajectories lead to significant deviation from the true path.

Finally, we can compare the vector fields learned by a standard flow matching model versus a rectified flow model (see Figure 17 ). The rectified flow model learns vector field that is more consistent over time, meaning the model has lower curvature in its trajectories.

Acknowledgements

I'd like to acknowledge my friend Sebastián Gutiérrez Hernández for his valuable feedback on this project, particularly on the formal explanations presented in this article. I would also like to thank Benjamin Hoover, Polo Chau, and Vivek Anand for their feedback on the visualizations and writing.

References

How to Cite

If you found this explainer helpful, please consider citing it:

@article{helbling2025rectifiedflows,
title = {A Visual Introduction to Rectified Flows},
author = {Helbling, Alec},
year = {2025},
url = {https://alechelbling.com/rectified-flows}
}