(DM Reconst.) Ch.5 Flow-Based Perspective - From NFs to Flow Matching

Diffusion Model Conceptual Reconstruction following The Principles of Diffusion Models

Hozy Summary

Normalizing Flows (NF)
- Idea)
  - Use sequence of invertible transformation from $p_{\text{data}}$ to $p_{\text{prior}}$.
    - cf.) Change-of-Variables Formula : $p(\mathbf{x}) = p_{\text{prior}}(\mathbf{z})\displaystyle\left\vert\text{det}\frac{\partial \mathbf{f}^-1(\mathbf{x})}{\partial \mathbf{x}}\right\vert,\quad\text{where } \mathbf{z}=\mathbf{f}^{-1}(\mathbf{x})$.
- Loss)
  $\begin{aligned} \mathcal{L}_{\text{NF}}(\phi) &= \mathbb{E}_{\mathbf{x}\sim p_{\text{data}}}\left[\log_\phi(\mathbf{x})\right] \\ &= \mathbb{E}_{\mathbf{x}\sim p_{\text{data}}}\left[ \log p_{\text{prior}}(\mathbf{z}) + \log\left\vert\text{det}\frac{\partial f_\phi^{-1}(\mathbf{x})}{\partial\mathbf{x}}\right\vert \right] \\ \end{aligned}$.
- Sampling)
  - Draw $\mathbf{x}_0\sim p_{\text{prior}}$.
  - Compute $\mathbf{x} = \mathbf{f}_\phi(\mathbf{x}_0)$.
- Limit)
  - Limited choice of transformation.
  - Trade off between the expressiveness of the model and the computational cost.
    - Must calculate the determinant of the Jacobian $O(D^3)$
Neural ODE (NODE)
- Idea)
  - Continuous transformation adopted to the NFs
    - $\displaystyle\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{\phi}(\mathbf{x}_{t}, t),\quad t\in[0,T]$.
- Loss)
  - MLE of $\mathcal{L}_\text{NODE}(\phi) := \mathbb{E}_{\mathbf{x}\sim p_\text{data}}\left[ \log p_\phi(\mathbf{x},T) \right]$
- Sampling)
  - Draw $\mathbf{x}(0)\sim p_{\text{prior}}$.
  - Compute $\displaystyle\mathbf{x}(T) = \mathbf{x}(0) + \int_0^T \mathbf{v}_{\phi^\times}(\mathbf{x}_{t}, t)\text{d}t$.
- Limit)
  - ODE solver is expensive due to the integration.
Flow Matching
- Idea)
  - Relationship between $\mathbf{x}_t,\;p_t,\;\Psi_{0\rightarrow t}\text{, and }\mathbf{v}_t$
    - $\mathbf{x}_t\sim p_t$.
    - $\Psi_{0\rightarrow t}(\mathbf{x}) = \mathbf{x}_t$.
    - $\frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}(\mathbf{x}) = \mathbf{v}_t(\mathbf{x})$.
  - $\mathbf{v}_t(\mathbf{x})$-prediction
    - i.e.) Parameterize the velocity field, i.e. $\mathbf{v}_\phi \approx \mathbf{v}_t$.
    - Similarity with the NODE.
  - Since $\mathbf{v}_t(\mathbf{x})$ is intractable, use conditional density, conditional flow, and conditional velocity field instead
    - i.e.) $\mathbf{v}_\phi(\cdot\mid\mathbf{z}) \approx \mathbf{v}_t(\cdot\mid\mathbf{z})$.
- Training)
  - Loss)
    - $\mathcal{L}_{\text{CFM}}(\phi) = \mathbb{E}_{t,\mathbf{x}\sim\pi(\mathbf{z}),\mathbf{x}_t\sim p_t(\cdot\mid\mathbf{z})}\left[ \Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - \mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z}) \Vert^2 \right] + C$.
  - Implementation)
    - Obtain $\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z})$ by…
      - setting $p_t(\cdot\mid\mathbf{z})$ (conditional probability path) first : Eulerian View
      - setting $\Psi_{0\rightarrow t}(\cdot\mid\mathbf{z})$ (conditional flow) first : Lagrangian View
    - We may choose the latent $\mathbf{z}$ by…
      1. Two-Sided Conditioning : $\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)$
      2. One-Sided Conditioning : $\mathbf{z} = \mathbf{x}_0$ or $\mathbf{z} = \mathbf{x}_1$

Prop.) Change-of-Variables Formula of Densities

Prop.)
- For
  - $\mathbf{f}$ : an invertible transformation
  - $\mathbf{z}\sim p_{\text{prior}}$
- the density of $\mathbf{x} = \mathbf{f}(\mathbf{z})$ is
  - $p(\mathbf{x}) = p_{\text{prior}}(\mathbf{z})\displaystyle\left\vert\text{det}\frac{\partial \mathbf{f}^-1(\mathbf{x})}{\partial \mathbf{x}}\right\vert,\quad\text{where } \mathbf{z}=\mathbf{f}^{-1}(\mathbf{x})$.

Intuition hozy note

5.1 Flow-Based Models: Normalizing Flows and Neural ODEs

Model) Normalizing Flows

Rezende and Mohamed, 2015
hozy note

Def.)
- For
  - $p_{\text{data}}(\mathbf{x})$ : a complex data distribution
  - $p_{\text{prior}}(\mathbf{z})$ : a simple prior
  - $\mathbf{f}_\phi:\mathbb{R}^D\rightarrow\mathbb{R}^D$ with
    - $\mathbf{x} = f_\phi(\mathbf{z})$ and $\mathbf{z}\sim p_{\text{prior}}$
- Using the change-of-variables formula, we may get the model likelihood of
  - $\log p_\phi(\mathbf{x}) = \displaystyle\log p_{\text{prior}}(\mathbf{z}) + \log\left\vert\text{det}\frac{\partial f_\phi^{-1}(\mathbf{x})}{\partial\mathbf{x}}\right\vert$.
Training Objective)
- We may learn parameters $\phi$ by maximizing the log-likelihood, which is equivalent to minimizing
  - $\mathcal{L}_{\text{NF}}(\phi) = \mathbb{E}_{\mathbf{x}\sim p_{\text{data}}}\left[\log_\phi(\mathbf{x})\right]$.
Limit)
- Simple linear transformation lacks expressiveness
  - Sol.) A sequence of $K$ trainable invertible mappings $\{\mathbf{f}_k\}_{k=0}^{L-1}$
    - Settings)
      - $\mathbf{f}_\phi = \mathbf{f}_{L-1}\circ\mathbf{f}_{L-2}\circ\cdots\circ\mathbf{f}_{0}$ where each $\mathbf{f}_{k}$ is parameterized by a neural network
      - $\mathbf{x}_{k+1} = \mathbf{f}_{k}(\mathbf{x}_{k}),\quad k=0,\ldots,L-1, \mathbf{z}=\mathbf{x}_{0}\sim p_{\text{prior}}, \mathbf{x}=\mathbf{x}_{L}$.
    - Then we get the log-likelihood of
      - $\log p_\phi(\mathbf{x}) = \log p_{\text{prior}}(\mathbf{x}_{0}) + \displaystyle\sum_{k=0}^{L-1}\log\left\vert\text{det}\frac{\partial \mathbf{f}_{k}}{\partial \mathbf{x}_{k}}\right\vert^{-1}$.
- Optimizing the above loss takes $O(D^3)$ runtime due to computing the Jacobian determinant.
  - Sol.) Using the Planar Flows or Residual Flows
Sampling)
- Draw $\mathbf{x}_0\sim p_{\text{prior}}$.
- Compute $\mathbf{x} = \mathbf{f}_\phi(\mathbf{x}_0)$.

Concept) Residual Flow

Model) Neural ODEs

Chen et al., 2018

Idea)
- Recall the discrete sequence of invertible mappings of Normalizing Flows of $\mathbf{x}_{k+1} = \mathbf{f}_{k}(\mathbf{x}_{k})$.
- Using the parameterized velocity field, we may reconstruct it as
  - $\mathbf{x}_{k+1} = \mathbf{x}_{k} + \mathbf{v}_{\phi_k}(\mathbf{x}_{k}, k)$.
- This formulation corresponds to the Euler discretization of the continuous time ODE of $\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{\phi}(\mathbf{x}_{t}, t)$.
- In the limit of infinite layers and vanishing step size ($\Delta t\rightarrow0$), the discrete NF converges to a continuous model : Nueral ODE
Model)
- A continuous transformation
  - $\displaystyle\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{\phi}(\mathbf{x}_{t}, t),\quad t\in[0,T]$.
    - where
      - $\mathbf{x}_{t}\in\mathbb{R}^D$ : the state at time $t$
      - $\mathbf{v}_{\phi}(\mathbf{x}_{t}, t)$ : a neural network parameterized by $\phi$
- Using the Instantaneous Change-of-Variables Formula below, we may get the log density of $x_T$ by the neural ODE given by
  - $\displaystyle \log p_\phi(\mathbf{x}_{T}, T) = \log p_{\text{prior}}(\mathbf{x}_{0}, 0) -\int_0^T \nabla_{\mathbf{x}} \mathbf{v}_{\phi}(\mathbf{x}_{t}, t)\text{d}t$.
Training)
- Goal) $p_\phi(\cdot, T) \approx p_{\text{data}}$
- Loss)
  - MLE of $\mathcal{L}_\text{NODE}(\phi) := \mathbb{E}_{\mathbf{x}\sim p_\text{data}}\left[ \log p_\phi(\mathbf{x},T) \right]$
- Limit)
  - ODE solver is expensive due to the integration.
  - The adjoint sensitivity method computes gradients via an auxiliary ODE with $O(1)$ memory complexity.
Sampling)
- Draw $\mathbf{x}(0)\sim p_{\text{prior}}$.
- Compute $\displaystyle\mathbf{x}(T) = \mathbf{x}(0) + \int_0^T \mathbf{v}_{\phi^\times}(\mathbf{x}_{t}, t)\text{d}t$.

Concept) Instantaneous Change-of-Variables Formula

Thm.)
- $\displaystyle\frac{\text{d}}{\text{d}t} \log p_\phi(\mathbf{x}_{t}, t) = -\nabla_{\mathbf{x}} \mathbf{v}_{\phi}(\mathbf{x}_{t}, t)$.
cf.)
- A continuous version of Change-of-Variables Formula
- Chen et al., 2018
- A special case of the Fokker-Planck equation
  - Continuity Equation

Desc.) Physical Interpretation of the Continuity Equation

From Appendix B

5.2. Flow Matching Framework

Settings)
- $\{p_t\}_{t\in[0,1]}$ : a predefined probability path s.t.
  - Boundary settings
    - $p_0 = p_{\text{src}},\quad p_1 = p_{\text{tgt}}$.
  - Marginal Density
    - For a latent variable $\mathbf{z}\sim\pi(\mathbf{z})$ drawn from an unknown distribution
      - Here, the common choices for $\mathbf{z}$ are
        
        Two-sided conditioning : $\mathbf{z}=(\mathbf{x}_0, \mathbf{x}_1)\sim p_{\text{src}}(\mathbf{x}_0) p_{\text{tgt}}(\mathbf{x}_1)$
        
        One-sided conditioning : $\mathbf{z} = \mathbf{x}_0$ or $\mathbf{z} = \mathbf{x}_1$
    - we may denote the marginal density $p_t(\mathbf{x}_t)$ as
      - $p_t(\mathbf{x}_t) = \displaystyle\int p_t(\mathbf{x}_t\mid\mathbf{z})\pi(\mathbf{z})\text{d}\mathbf{z}$ with $(\pi(\mathbf{z}), {p_t(\cdot\mid\mathbf{z})})$ chosen to satisfy the boundary conditions.
- $\mathbf{v}_t(\mathbf{x}_t)$ : a time-dependent vector(velocity) field whose associated ODE flow matches $\{p_t\}_{t\in[0,1]}$
  - s.t. induced ODE enables a sample-wise transformation of
    - $\displaystyle\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{t}(\mathbf{x}(t)),\quad t\in[0,1]$.
    - $\mathbf{x}(t)\sim p_t$.
  - or, equivalently captured with the Continuity Equation of
    - $\displaystyle\frac{\partial p_t(\mathbf{x})}{\partial t} + \nabla\cdot(\mathbf{v}_t(\mathbf{x})p_t(\mathbf{x})) = 0$.
      - cf.) For the intuition for the Continuity Equation, refer to the desc. below.
  - Any $\mathbf{v}_t$ that satisfies the above Continuity Equation is allowed!
Training)
- $\mathbf{v}$-prediction
  - $\mathcal{L}_{\text{FM}}(\phi) = \mathbb{E}_{t,\mathbf{x}_t\sim p_t}\left[ \Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - \mathbf{v}_t(\mathbf{x}_t) \Vert^2 \right]$.
- However, $\mathbf{v}_t(\mathbf{x}_t, t)$ is intractable.
  - Just like DDPM and DSM did, we may utilize the conditional velocity field of $\mathbf{v}_t(\mathbf{x}\mid\mathbf{z})$ as follows:
    $\begin{aligned} \mathcal{L}_{\text{FM}}(\phi) &= \mathbb{E}_{t,\mathbf{x}\sim\pi(\mathbf{z}),\mathbf{x}_t\sim p_t(\cdot\mid\mathbf{z})}\left[ \Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - \mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z}) \Vert^2 \right] + C \\ &\triangleq \mathcal{L}_{\text{CFM}}(\phi) + C \\ \end{aligned}$
- To obtain $\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z})$, we may choose either
  - to set $p_t(\cdot\mid\mathbf{z})$ (conditional probability path) first : Eulerian View
  - to set $\Psi_{0\rightarrow t}(\cdot\mid\mathbf{z})$ (conditional flow) first : Lagrangian View

Concept) Eulerian View

Goal)
- Construct $p_t(\cdot\mid\mathbf{z})$ first and then derive the corresponding $\mathbf{v}_t(\cdot\mid\mathbf{z})$.
E.g.) Both source and target distribution are Gaussian.
- Procedure)
  1. Identify the marginal density $p_t(\mathbf{x}_t)$.
  2. Choose a marginal flow map $\Psi_{0\rightarrow t}(\mathbf{x})$.
  3. Derive the marginal velocity field $\mathbf{v}_t(\mathbf{x})$ from $\Psi_{0\rightarrow t}(\mathbf{x})$.
  4. Show that the closed-form characterization on the marginal is valid on the conditional as well.
    - Two strategies for choosing the $\mathbf{z}$ from $p_t(\cdot\mid\mathbf{z})$
      1. Two-Sided Conditioning : $\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)$
      2. One-Sided Conditioning : $\mathbf{z} = \mathbf{x}_0$ or $\mathbf{z} = \mathbf{x}_1$
- Derivation)
  1. Identifying the marginal $p_t(\mathbf{x}_t)$.
    - Boundary conditions
      - $p_{\text{src}} = p_0 = \mathcal{N}\left(\mathbf{x};\;\mu(0),\sigma^2(0)\mathbf{I}\right)$.
      - $p_{\text{tgt}} = p_1 = \mathcal{N}\left(\mathbf{x};\;\mu(1),\sigma^2(1)\mathbf{I}\right)$.
    - Then the marginal density can be denoted as the interpolation of
      - $p_t(\mathbf{x}_t) = \mathcal{N}\left(\mathbf{x}_t;\;\mu(t),\sigma^2(t)\mathbf{I}\right)$.
      - And, $\{p_t\}_{t\in[0,1]}$ connects $p_{\text{src}}$ and $p_{\text{tgt}}$.
  2. Choosing the marginal flow map $\Psi_{0\rightarrow t}(\mathbf{x})$.
    - There are many velocity fields that induce an ODE flow s.t. $\mathbf{x}\sim p_0$ implies $\Psi_{0\rightarrow t}(\mathbf{x})\sim p_t$.
    - We may choose a flow map s.t.
      - $\displaystyle\Psi_{0\rightarrow t}(\mathbf{x}) := \mu(t) + \sigma(t)\left(\frac{\mathbf{x}-\mu(0)}{\sigma(0)}\right)$.
    - Since the flow map should be invertible, we have
      - $\forall\mathbf{y} = \Psi_{0\rightarrow t}(\mathbf{x}),\quad\exists\Psi_{0\rightarrow t}^{-1}:\mathbb{R}^D\rightarrow\mathbb{R}^D\text{ s.t. }\mathbf{x} = \Psi_{0\rightarrow t}^{-1}(\mathbf{y}) = \Psi_{t\rightarrow 0}(\mathbf{y})$.
  3. Deriving the velocity field.
    - $\displaystyle\mathbf{v}_t(\mathbf{x}) = \frac{\sigma'(t)}{\sigma(t)}(\mathbf{x}-\mu(t)) + \mu'(t)$.
      - i.e.) $\mathbf{v}_t = \partial_t \Psi_{0\rightarrow t} \circ \Psi^{-1}_{0\rightarrow t}$.
      - Why?)
        
        Putting $\mathbf{x}_0$ be the initial input, we may get
        
        $\displaystyle\Psi_{0\rightarrow t}(\mathbf{x}_0) = \mu(t) + \sigma(t)\left(\frac{\mathbf{x}_0-\mu(0)}{\sigma(0)}\right)\quad\cdots\quad(A)$.
        
        $\displaystyle\frac{\text{d}}{\text{d}t} (\Psi_{0\rightarrow t}(\mathbf{x}_0)) = \mathbf{v}_t (\Psi_{0\rightarrow t}(\mathbf{x}_0))\quad\cdots\quad(B)$.
        
        For some $\hat{\mathbf{x}}$,
        
        we may rewrite (A) as
        $\begin{aligned} \mathbf{x}_0 &= \Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})&\cdots\quad(C1)&\quad(\because\text{Invertibility}) \\ &= \mu(0) + \sigma(0)\left(\frac{\hat{\mathbf{x}}-\mu(t)}{\sigma(t)}\right) &\cdots\quad(C2) \end{aligned}$.
        
        which is equivalent to $\displaystyle\frac{\mathbf{x}_0-\mu(0)}{\sigma(0)} = \frac{\hat{\mathbf{x}}-\mu(t)}{\sigma(t)}\quad\cdots(C3)$
        
        we may rewrite (B) as
        
        $\displaystyle\frac{\text{d}}{\text{d}t} (\Psi_{0\rightarrow t}(\mathbf{x}_0)) = \mathbf{v}_t (\hat{\mathbf{x}})\quad\cdots\quad(D)$.
        
        Plugging (C1) into (D), we may get
        
        $\displaystyle\frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}(\Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})) = \mathbf{v}_t(\hat{\mathbf{x}})\quad\cdots\quad(E)$.
        
        From our definition, we may also get
        
        $\displaystyle \frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}(\mathbf{x}) = \mu'(t) + \sigma'(t)\left(\frac{\mathbf{x}-\mu(0)}{\sigma(0)}\right)$ given $\mathbf{x}\quad\cdots(F)$
        
        Replacing $\mathbf{x}$ in (F) with $\mathbf{x}_0 = \Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})$ from (C1,2), we get $\begin{aligned} \frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}\bigg(\overbrace{\Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})}^{=\mathbf{x}_0}\bigg) &= \mu'(t) + \sigma'(t)\Big(\frac{\overbrace{\mathbf{x}_0}^{=\Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})}-\mu(0)}{\sigma(0)}\Big) & (\text{From } (C1,2)) \\ &= \mu'(t) + \sigma'(t)\Big(\frac{\hat{\mathbf{x}}-\mu(t)}{\sigma(t)}\Big) & (\because (C3)) \\ &= \frac{\sigma'(t)}{\sigma(t)}(\hat{\mathbf{x}}-\mu(t)) + \mu'(t) \\ &= \mathbf{v}_t(\hat{\mathbf{x}}) & (\because (E)) \end{aligned}$.
        
        Again replacing $\hat{\mathbf{x}}$ with $\mathbf{x}$ we get
        
        $\displaystyle\mathbf{v}_t(\mathbf{x}) = \frac{\sigma'(t)}{\sigma(t)}(\mathbf{x}-\mu(t)) + \mu'(t)$.
  4. Showing the closed-form in marginal also works in conditional.
    - Two strategies for choosing the $\mathbf{z}$ from $p_t(\cdot\mid\mathbf{z})$
      1. Two-Sided Conditioning : $\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)$
        
        Settings)
        
        $\mathbf{z}\sim \pi(\mathbf{z}) := p_{\text{src}}(\mathbf{x}_0) p_{\text{tgt}}(\mathbf{x}_1)$.
        
        $p_t(\cdot\mid\mathbf{z}=(\mathbf{x}_0,\mathbf{x}_1)) := \mathcal{N}(\mathbf{x}_t;\; a_t \mathbf{x}_0 + b_t\mathbf{x}_1, \sigma^2\mathbf{I})$.
        
        where $a_t+b_t = 1$, i.e. the linear interpolation between $\mathbf{x}_0, \mathbf{x}_1$
        
        Deriving $\mathbf{v}_t(\cdot\mid\mathbf{z})$
        
        Then applying the derivation of $\mathbf{v}_t$ in marginal, we may get
        
        $\mathbf{v}_t(\mathbf{x}\mid\mathbf{z}) = a_t' \mathbf{x}_0 + b_t' \mathbf{x}_1$.
        
        CFM Loss
        
        $\mathcal{L}_{\text{CFM}} = \mathbb{E}_{t, \mathbf{x}_0\sim p_{\text{src}}, \mathbf{x}_1\sim p_{\text{tgt}}}\left\Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - (a_t' \mathbf{x}_0 + b_t' \mathbf{x}_1) \right\Vert^2$.
      2. One-Sided Conditioning : Consider the case of $\mathbf{z} = \mathbf{x}_1$
        
        Settings)
        
        $\mathbf{z}\sim \pi(\mathbf{z}) := p_{\text{data}}(\mathbf{x}_1)$.
        
        $p_t(\cdot\mid\mathbf{z}=\mathbf{x}_1) := \mathcal{N}(\mathbf{x}_t;\; b_t \mathbf{x}_1, a_t^2\mathbf{I})$.
        
        where $\begin{cases} a_0 = 1, b_0 = 0 \\ a_1 = 0, b_1 = 1 \\ \end{cases} \Leftrightarrow \begin{cases} p_0(\cdot\mid\mathbf{z}=\mathbf{x}_1) = \mathcal{N}(\cdot;\; \mathbf{0,I}) \\ p_1(\cdot\mid\mathbf{z}=\mathbf{x}_1) = delta(\cdot - \mathbf{x}_1) \\ a_1 = 0, b_1 = 1 \\ \end{cases}$
        
        Deriving $\mathbf{v}_t(\cdot\mid\mathbf{z})$
        
        Then applying the derivation of $\mathbf{v}_t$ in marginal, we may get
        
        $\mathbf{v}_t(\mathbf{x}\mid\mathbf{x}_1) = b_t'\mathbf{x}_1 + \frac{a_t'}{a_t}(\mathbf{x} - b_t\mathbf{x}_1)$.
        
        CFM Loss
        
        $\mathcal{L}_{\text{CFM}} = \mathbb{E}_{t, \mathbf{x}_0\sim p_{\text{src}}, \mathbf{x}_1\sim p_{\text{tgt}}}\left\Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - \left[ b_t'\mathbf{x}_1 + \frac{a_t'}{a_t}(\mathbf{x} - b_t\mathbf{x}_1) \right] \right\Vert^2$.

Concept) Lagrangian View

Goal)
- Design the conditional flow map $\Psi_{0\rightarrow t}(\cdot;\mathbf{z})$ first and directly obtain the corresponding $\mathbf{v}_t(\cdot\mid\mathbf{z})$ by differentiating along particle trajectories, i.e. $\frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}(\mathbf{y};\mathbf{z}) = \mathbf{v}_t\Big(\underbrace{\Psi_{0\rightarrow t}(\mathbf{y};\mathbf{z})}_{\mathbf{x}}\mid\mathbf{z}\Big)$
e.g.) Conditional Affine Flow
- Settings)
  - $\mathbf{z}\sim\pi$ : a latent variable
    - which can be (but not limited to) either…
      1. Two-Sided Conditioning : $\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)$
      2. One-Sided Conditioning : $\mathbf{z} = \mathbf{x}_0$ or $\mathbf{z} = \mathbf{x}_1$
    - Refer to Eulerian View for more details.
  - $\Psi_{0\rightarrow t}(\mathbf{x}_0;\mathbf{z}) := \mu_t(\mathbf{z}) + \mathbf{A}_t(\mathbf{z})\mathbf{x}_0,\quad t\in[0,1]$ : the time-varying conditional affine flow
    - where
      - $\mu_t(\mathbf{z})\in\mathbb{R}^D$.
        
        cf.) $\mu_0(\mathbf{z}) = \mathbf{0}$
      - $\mathbf{A}_t(\mathbf{z})\in\mathbb{R}^{D\times D}$ is invertible for $t\in[0,1]$
        
        cf.) $\mathbf{A}_0(\mathbf{z}) = \mathbf{I}$
- Induce the Conditional Path $p_t(\cdot\mid\mathbf{z})$ :
  - $p_t(\cdot\mid\mathbf{z}) = [\Psi_t(\cdot;\mathbf{z})]_\# p_0$.
    - cf.) $\#$ means pushing the density forward which is equivalent to the change-of-variables
      - i.e.) $p_t(\mathbf{x}_t) = p_0(\Psi_t^{-1}(\mathbf{x}_t)) \left\vert \det \nabla_{\mathbf{x}_t} \Psi_t^{-1}(\mathbf{x}_t) \right\vert$
    - Prop.)
      - Since $\mathbf{A}$ is affine, if $p_0$ is Gaussian, then $p_t$ is Gaussian as well.
  - $p_t(\cdot) = \displaystyle\int p_t(\cdot\mid\mathbf{z})\pi(\mathbf{z})\text{d}\mathbf{z}$.
- Derive the Conditional Velocity $\mathbf{v}_t(\cdot\mid\mathbf{z})$ :
  - Applying the derivation result $\displaystyle\mathbf{v}_t(\mathbf{x}) = \frac{\sigma'(t)}{\sigma(t)}(\mathbf{x}-\mu(t)) + \mu'(t)$ from the Eulerian View, we may get
    - $\mathbf{v}(\mathbf{x}\mid\mathbf{z}) = \mu_t'(\mathbf{z}) + \mathbf{A}_t'(\mathbf{z})\mathbf{A}_t(\mathbf{z})^{-1}(\mathbf{x}-\mu_t(\mathbf{z}))$.
- Depending on the conditioning strategy…
  1. One-Sided Conditioning : $\mathbf{z} = \mathbf{x}_0$ or $\mathbf{z} = \mathbf{x}_1$
    - We may set $\Psi_t$ as
      $\begin{cases} \mu_t(\mathbf{z}) = b_t\mathbf{z} \\ \mathbf{A}_t(\mathbf{z}) = a_t \mathbf{I} \\ \end{cases},\quad \begin{cases} a_0 = 1, b_0 = 0 \\ a_1 = 0, b_1 = 1 \\ \end{cases}$.
    - Then, we have
      - $\mathbf{x}_t = a_t\mathbf{x}_0 + b_t,\quad \mathbf{v}_t(\mathbf{x}\mid\mathbf{x}_1) = b_t'\mathbf{x}_1 + \frac{a_t'}{a_t}(\mathbf{x} - b_t \mathbf{x}_1)$.
    - Thus,
      - $\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{x}_1) = a_t'\mathbf{x}_0 + b_t'\mathbf{x}_1$.
  2. Two-Sided Conditioning : $\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)$
    - We may set $\Psi_t$ as
      $\begin{cases} \mu_t(\mathbf{x}_0, \mathbf{x}_1) = b_t\mathbf{x}_1 \\ \mathbf{A}_t(\mathbf{x}_0, \mathbf{x}_1) = a_t \mathbf{I} \\ \end{cases}$.
    - Then, we have
      - $\mathbf{x}_t = a_t\mathbf{x}_0 + b_t\mathbf{x}_1,\quad p_t(\cdot\mid\mathbf{x}_0, \mathbf{x}_1) = \delta(\cdot - (a_t\mathbf{x}_0 + b_t\mathbf{x}_1))$.
    - Thus,
      - $\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{x}_0,\mathbf{x}_1) = a_t'\mathbf{x}_0 + b_t'\mathbf{x}_1$.

Model) Rectified Flow

Refer to the previous note on RF

Hozy Summary

Prop.) Change-of-Variables Formula of Densities

5.1 Flow-Based Models: Normalizing Flows and Neural ODEs

Model) Normalizing Flows

Concept) Residual Flow

Model) Neural ODEs

Concept) Instantaneous Change-of-Variables Formula

Desc.) Physical Interpretation of the Continuity Equation

5.2. Flow Matching Framework

Concept) Eulerian View

Concept) Lagrangian View

Model) Rectified Flow

Enjoy Reading This Article?