(DM Reconst.) Ch.5 Flow-Based Perspective - From NFs to Flow Matching

Diffusion Model Conceptual Reconstruction following The Principles of Diffusion Models



Hozy Summary

  • Normalizing Flows (NF)
    • Idea)
      • Use sequence of invertible transformation from $p_{\text{data}}$ to $p_{\text{prior}}$.
        • cf.) Change-of-Variables Formula : \(p(\mathbf{x}) = p_{\text{prior}}(\mathbf{z})\displaystyle\left\vert\text{det}\frac{\partial \mathbf{f}^-1(\mathbf{x})}{\partial \mathbf{x}}\right\vert,\quad\text{where } \mathbf{z}=\mathbf{f}^{-1}(\mathbf{x})\).
    • Loss)
      \(\begin{aligned} \mathcal{L}_{\text{NF}}(\phi) &= \mathbb{E}_{\mathbf{x}\sim p_{\text{data}}}\left[\log_\phi(\mathbf{x})\right] \\ &= \mathbb{E}_{\mathbf{x}\sim p_{\text{data}}}\left[ \log p_{\text{prior}}(\mathbf{z}) + \log\left\vert\text{det}\frac{\partial f_\phi^{-1}(\mathbf{x})}{\partial\mathbf{x}}\right\vert \right] \\ \end{aligned}\).
    • Sampling)
      • Draw \(\mathbf{x}_0\sim p_{\text{prior}}\).
      • Compute \(\mathbf{x} = \mathbf{f}_\phi(\mathbf{x}_0)\).
    • Limit)
      • Limited choice of transformation.
      • Trade off between the expressiveness of the model and the computational cost.
        • Must calculate the determinant of the Jacobian $O(D^3)$
  • Neural ODE (NODE)
    • Idea)
      • Continuous transformation adopted to the NFs
        • \(\displaystyle\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{\phi}(\mathbf{x}_{t}, t),\quad t\in[0,T]\).
    • Loss)
      • MLE of \(\mathcal{L}_\text{NODE}(\phi) := \mathbb{E}_{\mathbf{x}\sim p_\text{data}}\left[ \log p_\phi(\mathbf{x},T) \right]\)
    • Sampling)
      • Draw \(\mathbf{x}(0)\sim p_{\text{prior}}\).
      • Compute \(\displaystyle\mathbf{x}(T) = \mathbf{x}(0) + \int_0^T \mathbf{v}_{\phi^\times}(\mathbf{x}_{t}, t)\text{d}t\).
    • Limit)
      • ODE solver is expensive due to the integration.
  • Flow Matching
    • Idea)
      • Relationship between \(\mathbf{x}_t,\;p_t,\;\Psi_{0\rightarrow t}\text{, and }\mathbf{v}_t\)
        • \(\mathbf{x}_t\sim p_t\).
        • \(\Psi_{0\rightarrow t}(\mathbf{x}) = \mathbf{x}_t\).
        • \(\frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}(\mathbf{x}) = \mathbf{v}_t(\mathbf{x})\).
      • \(\mathbf{v}_t(\mathbf{x})\)-prediction
        • i.e.) Parameterize the velocity field, i.e. \(\mathbf{v}_\phi \approx \mathbf{v}_t\).
        • Similarity with the NODE.
      • Since \(\mathbf{v}_t(\mathbf{x})\) is intractable, use conditional density, conditional flow, and conditional velocity field instead
        • i.e.) \(\mathbf{v}_\phi(\cdot\mid\mathbf{z}) \approx \mathbf{v}_t(\cdot\mid\mathbf{z})\).
    • Training)
      • Loss)
        • \(\mathcal{L}_{\text{CFM}}(\phi) = \mathbb{E}_{t,\mathbf{x}\sim\pi(\mathbf{z}),\mathbf{x}_t\sim p_t(\cdot\mid\mathbf{z})}\left[ \Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - \mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z}) \Vert^2 \right] + C\).
      • Implementation)
        • Obtain \(\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z})\) by…
          • setting $p_t(\cdot\mid\mathbf{z})$ (conditional probability path) first : Eulerian View
          • setting $\Psi_{0\rightarrow t}(\cdot\mid\mathbf{z})$ (conditional flow) first : Lagrangian View
        • We may choose the latent $\mathbf{z}$ by…
          1. Two-Sided Conditioning : \(\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)\)
          2. One-Sided Conditioning : \(\mathbf{z} = \mathbf{x}_0\) or \(\mathbf{z} = \mathbf{x}_1\)



Prop.) Change-of-Variables Formula of Densities

  • Prop.)
    • For
      • $\mathbf{f}$ : an invertible transformation
      • $\mathbf{z}\sim p_{\text{prior}}$
    • the density of $\mathbf{x} = \mathbf{f}(\mathbf{z})$ is
      • \(p(\mathbf{x}) = p_{\text{prior}}(\mathbf{z})\displaystyle\left\vert\text{det}\frac{\partial \mathbf{f}^-1(\mathbf{x})}{\partial \mathbf{x}}\right\vert,\quad\text{where } \mathbf{z}=\mathbf{f}^{-1}(\mathbf{x})\).
Intuition hozy note


5.1 Flow-Based Models: Normalizing Flows and Neural ODEs

Model) Normalizing Flows

Rezende and Mohamed, 2015
hozy note

  • Def.)
    • For
      • \(p_{\text{data}}(\mathbf{x})\) : a complex data distribution
      • \(p_{\text{prior}}(\mathbf{z})\) : a simple prior
      • \(\mathbf{f}_\phi:\mathbb{R}^D\rightarrow\mathbb{R}^D\) with
        • \(\mathbf{x} = f_\phi(\mathbf{z})\) and \(\mathbf{z}\sim p_{\text{prior}}\)
    • Using the change-of-variables formula, we may get the model likelihood of
      • \(\log p_\phi(\mathbf{x}) = \displaystyle\log p_{\text{prior}}(\mathbf{z}) + \log\left\vert\text{det}\frac{\partial f_\phi^{-1}(\mathbf{x})}{\partial\mathbf{x}}\right\vert\).
  • Training Objective)
    • We may learn parameters $\phi$ by maximizing the log-likelihood, which is equivalent to minimizing
      • \(\mathcal{L}_{\text{NF}}(\phi) = \mathbb{E}_{\mathbf{x}\sim p_{\text{data}}}\left[\log_\phi(\mathbf{x})\right]\).
  • Limit)
    • Simple linear transformation lacks expressiveness
      • Sol.) A sequence of $K$ trainable invertible mappings \(\{\mathbf{f}_k\}_{k=0}^{L-1}\)
        • Settings)
          • \(\mathbf{f}_\phi = \mathbf{f}_{L-1}\circ\mathbf{f}_{L-2}\circ\cdots\circ\mathbf{f}_{0}\) where each \(\mathbf{f}_{k}\) is parameterized by a neural network
          • \(\mathbf{x}_{k+1} = \mathbf{f}_{k}(\mathbf{x}_{k}),\quad k=0,\ldots,L-1, \mathbf{z}=\mathbf{x}_{0}\sim p_{\text{prior}}, \mathbf{x}=\mathbf{x}_{L}\).
        • Then we get the log-likelihood of
          • \(\log p_\phi(\mathbf{x}) = \log p_{\text{prior}}(\mathbf{x}_{0}) + \displaystyle\sum_{k=0}^{L-1}\log\left\vert\text{det}\frac{\partial \mathbf{f}_{k}}{\partial \mathbf{x}_{k}}\right\vert^{-1}\).
    • Optimizing the above loss takes \(O(D^3)\) runtime due to computing the Jacobian determinant.
  • Sampling)
    • Draw \(\mathbf{x}_0\sim p_{\text{prior}}\).
    • Compute \(\mathbf{x} = \mathbf{f}_\phi(\mathbf{x}_0)\).


Concept) Residual Flow



Model) Neural ODEs

Chen et al., 2018

  • Idea)
    • Recall the discrete sequence of invertible mappings of Normalizing Flows of \(\mathbf{x}_{k+1} = \mathbf{f}_{k}(\mathbf{x}_{k})\).
    • Using the parameterized velocity field, we may reconstruct it as
      • \(\mathbf{x}_{k+1} = \mathbf{x}_{k} + \mathbf{v}_{\phi_k}(\mathbf{x}_{k}, k)\).
    • This formulation corresponds to the Euler discretization of the continuous time ODE of \(\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{\phi}(\mathbf{x}_{t}, t)\).
    • In the limit of infinite layers and vanishing step size ($\Delta t\rightarrow0$), the discrete NF converges to a continuous model : Nueral ODE
  • Model)
    • A continuous transformation
      • \(\displaystyle\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{\phi}(\mathbf{x}_{t}, t),\quad t\in[0,T]\).
        • where
          • \(\mathbf{x}_{t}\in\mathbb{R}^D\) : the state at time $t$
          • \(\mathbf{v}_{\phi}(\mathbf{x}_{t}, t)\) : a neural network parameterized by $\phi$
    • Using the Instantaneous Change-of-Variables Formula below, we may get the log density of $x_T$ by the neural ODE given by
      • \(\displaystyle \log p_\phi(\mathbf{x}_{T}, T) = \log p_{\text{prior}}(\mathbf{x}_{0}, 0) -\int_0^T \nabla_{\mathbf{x}} \mathbf{v}_{\phi}(\mathbf{x}_{t}, t)\text{d}t\).
  • Training)
    • Goal) \(p_\phi(\cdot, T) \approx p_{\text{data}}\)
    • Loss)
      • MLE of \(\mathcal{L}_\text{NODE}(\phi) := \mathbb{E}_{\mathbf{x}\sim p_\text{data}}\left[ \log p_\phi(\mathbf{x},T) \right]\)
    • Limit)
      • ODE solver is expensive due to the integration.
      • The adjoint sensitivity method computes gradients via an auxiliary ODE with $O(1)$ memory complexity.
  • Sampling)
    • Draw \(\mathbf{x}(0)\sim p_{\text{prior}}\).
    • Compute \(\displaystyle\mathbf{x}(T) = \mathbf{x}(0) + \int_0^T \mathbf{v}_{\phi^\times}(\mathbf{x}_{t}, t)\text{d}t\).


Concept) Instantaneous Change-of-Variables Formula


Desc.) Physical Interpretation of the Continuity Equation

From Appendix B



5.2. Flow Matching Framework

  • Settings)
    • \(\{p_t\}_{t\in[0,1]}\) : a predefined probability path s.t.
      • Boundary settings
        • \(p_0 = p_{\text{src}},\quad p_1 = p_{\text{tgt}}\).
      • Marginal Density
        • For a latent variable $\mathbf{z}\sim\pi(\mathbf{z})$ drawn from an unknown distribution
          • Here, the common choices for $\mathbf{z}$ are
            • Two-sided conditioning : \(\mathbf{z}=(\mathbf{x}_0, \mathbf{x}_1)\sim p_{\text{src}}(\mathbf{x}_0) p_{\text{tgt}}(\mathbf{x}_1)\)
            • One-sided conditioning : \(\mathbf{z} = \mathbf{x}_0\) or \(\mathbf{z} = \mathbf{x}_1\)
        • we may denote the marginal density $p_t(\mathbf{x}_t)$ as
          • \(p_t(\mathbf{x}_t) = \displaystyle\int p_t(\mathbf{x}_t\mid\mathbf{z})\pi(\mathbf{z})\text{d}\mathbf{z}\) with $(\pi(\mathbf{z}), {p_t(\cdot\mid\mathbf{z})})$ chosen to satisfy the boundary conditions.
    • \(\mathbf{v}_t(\mathbf{x}_t)\) : a time-dependent vector(velocity) field whose associated ODE flow matches \(\{p_t\}_{t\in[0,1]}\)
      • s.t. induced ODE enables a sample-wise transformation of
        • \(\displaystyle\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{t}(\mathbf{x}(t)),\quad t\in[0,1]\).
        • \(\mathbf{x}(t)\sim p_t\).
      • or, equivalently captured with the Continuity Equation of
      • Any \(\mathbf{v}_t\) that satisfies the above Continuity Equation is allowed!
  • Training)
    • $\mathbf{v}$-prediction
      • \(\mathcal{L}_{\text{FM}}(\phi) = \mathbb{E}_{t,\mathbf{x}_t\sim p_t}\left[ \Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - \mathbf{v}_t(\mathbf{x}_t) \Vert^2 \right]\).
    • However, $\mathbf{v}_t(\mathbf{x}_t, t)$ is intractable.
      • Just like DDPM and DSM did, we may utilize the conditional velocity field of $\mathbf{v}_t(\mathbf{x}\mid\mathbf{z})$ as follows:
        \(\begin{aligned} \mathcal{L}_{\text{FM}}(\phi) &= \mathbb{E}_{t,\mathbf{x}\sim\pi(\mathbf{z}),\mathbf{x}_t\sim p_t(\cdot\mid\mathbf{z})}\left[ \Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - \mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z}) \Vert^2 \right] + C \\ &\triangleq \mathcal{L}_{\text{CFM}}(\phi) + C \\ \end{aligned}\)
    • To obtain \(\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z})\), we may choose either
      • to set $p_t(\cdot\mid\mathbf{z})$ (conditional probability path) first : Eulerian View
      • to set $\Psi_{0\rightarrow t}(\cdot\mid\mathbf{z})$ (conditional flow) first : Lagrangian View


Concept) Eulerian View

  • Goal)
    • Construct $p_t(\cdot\mid\mathbf{z})$ first and then derive the corresponding $\mathbf{v}_t(\cdot\mid\mathbf{z})$.
  • E.g.) Both source and target distribution are Gaussian.
    • Procedure)
      1. Identify the marginal density \(p_t(\mathbf{x}_t)\).
      2. Choose a marginal flow map \(\Psi_{0\rightarrow t}(\mathbf{x})\).
      3. Derive the marginal velocity field \(\mathbf{v}_t(\mathbf{x})\) from \(\Psi_{0\rightarrow t}(\mathbf{x})\).
      4. Show that the closed-form characterization on the marginal is valid on the conditional as well.
        • Two strategies for choosing the $\mathbf{z}$ from $p_t(\cdot\mid\mathbf{z})$
          1. Two-Sided Conditioning : \(\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)\)
          2. One-Sided Conditioning : \(\mathbf{z} = \mathbf{x}_0\) or \(\mathbf{z} = \mathbf{x}_1\)
    • Derivation)
      1. Identifying the marginal \(p_t(\mathbf{x}_t)\).
        • Boundary conditions
          • \(p_{\text{src}} = p_0 = \mathcal{N}\left(\mathbf{x};\;\mu(0),\sigma^2(0)\mathbf{I}\right)\).
          • \(p_{\text{tgt}} = p_1 = \mathcal{N}\left(\mathbf{x};\;\mu(1),\sigma^2(1)\mathbf{I}\right)\).
        • Then the marginal density can be denoted as the interpolation of
          • \(p_t(\mathbf{x}_t) = \mathcal{N}\left(\mathbf{x}_t;\;\mu(t),\sigma^2(t)\mathbf{I}\right)\).
          • And, \(\{p_t\}_{t\in[0,1]}\) connects \(p_{\text{src}}\) and \(p_{\text{tgt}}\).
      2. Choosing the marginal flow map \(\Psi_{0\rightarrow t}(\mathbf{x})\).
        • There are many velocity fields that induce an ODE flow s.t. $\mathbf{x}\sim p_0$ implies \(\Psi_{0\rightarrow t}(\mathbf{x})\sim p_t\).
        • We may choose a flow map s.t.
          • \(\displaystyle\Psi_{0\rightarrow t}(\mathbf{x}) := \mu(t) + \sigma(t)\left(\frac{\mathbf{x}-\mu(0)}{\sigma(0)}\right)\).
        • Since the flow map should be invertible, we have
          • \(\forall\mathbf{y} = \Psi_{0\rightarrow t}(\mathbf{x}),\quad\exists\Psi_{0\rightarrow t}^{-1}:\mathbb{R}^D\rightarrow\mathbb{R}^D\text{ s.t. }\mathbf{x} = \Psi_{0\rightarrow t}^{-1}(\mathbf{y}) = \Psi_{t\rightarrow 0}(\mathbf{y})\).
      3. Deriving the velocity field.
        • \(\displaystyle\mathbf{v}_t(\mathbf{x}) = \frac{\sigma'(t)}{\sigma(t)}(\mathbf{x}-\mu(t)) + \mu'(t)\).
          • i.e.) \(\mathbf{v}_t = \partial_t \Psi_{0\rightarrow t} \circ \Psi^{-1}_{0\rightarrow t}\).
          • Why?)
            • Putting $\mathbf{x}_0$ be the initial input, we may get
              • \(\displaystyle\Psi_{0\rightarrow t}(\mathbf{x}_0) = \mu(t) + \sigma(t)\left(\frac{\mathbf{x}_0-\mu(0)}{\sigma(0)}\right)\quad\cdots\quad(A)\).
              • \(\displaystyle\frac{\text{d}}{\text{d}t} (\Psi_{0\rightarrow t}(\mathbf{x}_0)) = \mathbf{v}_t (\Psi_{0\rightarrow t}(\mathbf{x}_0))\quad\cdots\quad(B)\).
            • For some $\hat{\mathbf{x}}$,
              • we may rewrite (A) as
                \(\begin{aligned} \mathbf{x}_0 &= \Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})&\cdots\quad(C1)&\quad(\because\text{Invertibility}) \\ &= \mu(0) + \sigma(0)\left(\frac{\hat{\mathbf{x}}-\mu(t)}{\sigma(t)}\right) &\cdots\quad(C2) \end{aligned}\).
                • which is equivalent to \(\displaystyle\frac{\mathbf{x}_0-\mu(0)}{\sigma(0)} = \frac{\hat{\mathbf{x}}-\mu(t)}{\sigma(t)}\quad\cdots(C3)\)
              • we may rewrite (B) as
                • \(\displaystyle\frac{\text{d}}{\text{d}t} (\Psi_{0\rightarrow t}(\mathbf{x}_0)) = \mathbf{v}_t (\hat{\mathbf{x}})\quad\cdots\quad(D)\).
            • Plugging (C1) into (D), we may get
              • \(\displaystyle\frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}(\Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})) = \mathbf{v}_t(\hat{\mathbf{x}})\quad\cdots\quad(E)\).
            • From our definition, we may also get
              • \(\displaystyle \frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}(\mathbf{x}) = \mu'(t) + \sigma'(t)\left(\frac{\mathbf{x}-\mu(0)}{\sigma(0)}\right)\) given $\mathbf{x}\quad\cdots(F)$
            • Replacing \(\mathbf{x}\) in (F) with \(\mathbf{x}_0 = \Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})\) from (C1,2), we get \(\begin{aligned} \frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}\bigg(\overbrace{\Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})}^{=\mathbf{x}_0}\bigg) &= \mu'(t) + \sigma'(t)\Big(\frac{\overbrace{\mathbf{x}_0}^{=\Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})}-\mu(0)}{\sigma(0)}\Big) & (\text{From } (C1,2)) \\ &= \mu'(t) + \sigma'(t)\Big(\frac{\hat{\mathbf{x}}-\mu(t)}{\sigma(t)}\Big) & (\because (C3)) \\ &= \frac{\sigma'(t)}{\sigma(t)}(\hat{\mathbf{x}}-\mu(t)) + \mu'(t) \\ &= \mathbf{v}_t(\hat{\mathbf{x}}) & (\because (E)) \end{aligned}\).
            • Again replacing $\hat{\mathbf{x}}$ with $\mathbf{x}$ we get
              • \(\displaystyle\mathbf{v}_t(\mathbf{x}) = \frac{\sigma'(t)}{\sigma(t)}(\mathbf{x}-\mu(t)) + \mu'(t)\).
      4. Showing the closed-form in marginal also works in conditional.
        • Two strategies for choosing the $\mathbf{z}$ from $p_t(\cdot\mid\mathbf{z})$
          1. Two-Sided Conditioning : \(\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)\)
            • Settings)
              • \(\mathbf{z}\sim \pi(\mathbf{z}) := p_{\text{src}}(\mathbf{x}_0) p_{\text{tgt}}(\mathbf{x}_1)\).
              • \(p_t(\cdot\mid\mathbf{z}=(\mathbf{x}_0,\mathbf{x}_1)) := \mathcal{N}(\mathbf{x}_t;\; a_t \mathbf{x}_0 + b_t\mathbf{x}_1, \sigma^2\mathbf{I})\).
                • where $a_t+b_t = 1$, i.e. the linear interpolation between \(\mathbf{x}_0, \mathbf{x}_1\)
            • Deriving $\mathbf{v}_t(\cdot\mid\mathbf{z})$
              • Then applying the derivation of $\mathbf{v}_t$ in marginal, we may get
                • \(\mathbf{v}_t(\mathbf{x}\mid\mathbf{z}) = a_t' \mathbf{x}_0 + b_t' \mathbf{x}_1\).
            • CFM Loss
              • \(\mathcal{L}_{\text{CFM}} = \mathbb{E}_{t, \mathbf{x}_0\sim p_{\text{src}}, \mathbf{x}_1\sim p_{\text{tgt}}}\left\Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - (a_t' \mathbf{x}_0 + b_t' \mathbf{x}_1) \right\Vert^2\).
          2. One-Sided Conditioning : Consider the case of \(\mathbf{z} = \mathbf{x}_1\)
            • Settings)
              • \(\mathbf{z}\sim \pi(\mathbf{z}) := p_{\text{data}}(\mathbf{x}_1)\).
              • \(p_t(\cdot\mid\mathbf{z}=\mathbf{x}_1) := \mathcal{N}(\mathbf{x}_t;\; b_t \mathbf{x}_1, a_t^2\mathbf{I})\).
                • where \(\begin{cases} a_0 = 1, b_0 = 0 \\ a_1 = 0, b_1 = 1 \\ \end{cases} \Leftrightarrow \begin{cases} p_0(\cdot\mid\mathbf{z}=\mathbf{x}_1) = \mathcal{N}(\cdot;\; \mathbf{0,I}) \\ p_1(\cdot\mid\mathbf{z}=\mathbf{x}_1) = delta(\cdot - \mathbf{x}_1) \\ a_1 = 0, b_1 = 1 \\ \end{cases}\)
            • Deriving $\mathbf{v}_t(\cdot\mid\mathbf{z})$
              • Then applying the derivation of $\mathbf{v}_t$ in marginal, we may get
                • \(\mathbf{v}_t(\mathbf{x}\mid\mathbf{x}_1) = b_t'\mathbf{x}_1 + \frac{a_t'}{a_t}(\mathbf{x} - b_t\mathbf{x}_1)\).
            • CFM Loss
              • \(\mathcal{L}_{\text{CFM}} = \mathbb{E}_{t, \mathbf{x}_0\sim p_{\text{src}}, \mathbf{x}_1\sim p_{\text{tgt}}}\left\Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - \left[ b_t'\mathbf{x}_1 + \frac{a_t'}{a_t}(\mathbf{x} - b_t\mathbf{x}_1) \right] \right\Vert^2\).


Concept) Lagrangian View

  • Goal)
    • Design the conditional flow map $\Psi_{0\rightarrow t}(\cdot;\mathbf{z})$ first and directly obtain the corresponding $\mathbf{v}_t(\cdot\mid\mathbf{z})$ by differentiating along particle trajectories, i.e. \(\frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}(\mathbf{y};\mathbf{z}) = \mathbf{v}_t\Big(\underbrace{\Psi_{0\rightarrow t}(\mathbf{y};\mathbf{z})}_{\mathbf{x}}\mid\mathbf{z}\Big)\)
  • e.g.) Conditional Affine Flow
    • Settings)
      • \(\mathbf{z}\sim\pi\) : a latent variable
        • which can be (but not limited to) either…
          1. Two-Sided Conditioning : \(\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)\)
          2. One-Sided Conditioning : \(\mathbf{z} = \mathbf{x}_0\) or \(\mathbf{z} = \mathbf{x}_1\)
        • Refer to Eulerian View for more details.
      • \(\Psi_{0\rightarrow t}(\mathbf{x}_0;\mathbf{z}) := \mu_t(\mathbf{z}) + \mathbf{A}_t(\mathbf{z})\mathbf{x}_0,\quad t\in[0,1]\) : the time-varying conditional affine flow
        • where
          • \(\mu_t(\mathbf{z})\in\mathbb{R}^D\).
            • cf.) \(\mu_0(\mathbf{z}) = \mathbf{0}\)
          • \(\mathbf{A}_t(\mathbf{z})\in\mathbb{R}^{D\times D}\) is invertible for \(t\in[0,1]\)
            • cf.) \(\mathbf{A}_0(\mathbf{z}) = \mathbf{I}\)
    • Induce the Conditional Path \(p_t(\cdot\mid\mathbf{z})\) :
      • \(p_t(\cdot\mid\mathbf{z}) = [\Psi_t(\cdot;\mathbf{z})]_\# p_0\).
        • cf.) \(\#\) means pushing the density forward which is equivalent to the change-of-variables
          • i.e.) \(p_t(\mathbf{x}_t) = p_0(\Psi_t^{-1}(\mathbf{x}_t)) \left\vert \det \nabla_{\mathbf{x}_t} \Psi_t^{-1}(\mathbf{x}_t) \right\vert\)
        • Prop.)
          • Since $\mathbf{A}$ is affine, if $p_0$ is Gaussian, then $p_t$ is Gaussian as well.
      • \(p_t(\cdot) = \displaystyle\int p_t(\cdot\mid\mathbf{z})\pi(\mathbf{z})\text{d}\mathbf{z}\).
    • Derive the Conditional Velocity \(\mathbf{v}_t(\cdot\mid\mathbf{z})\) :
      • Applying the derivation result \(\displaystyle\mathbf{v}_t(\mathbf{x}) = \frac{\sigma'(t)}{\sigma(t)}(\mathbf{x}-\mu(t)) + \mu'(t)\) from the Eulerian View, we may get
        • \(\mathbf{v}(\mathbf{x}\mid\mathbf{z}) = \mu_t'(\mathbf{z}) + \mathbf{A}_t'(\mathbf{z})\mathbf{A}_t(\mathbf{z})^{-1}(\mathbf{x}-\mu_t(\mathbf{z}))\).
    • Depending on the conditioning strategy…
      1. One-Sided Conditioning : \(\mathbf{z} = \mathbf{x}_0\) or \(\mathbf{z} = \mathbf{x}_1\)
        • We may set $\Psi_t$ as
          \(\begin{cases} \mu_t(\mathbf{z}) = b_t\mathbf{z} \\ \mathbf{A}_t(\mathbf{z}) = a_t \mathbf{I} \\ \end{cases},\quad \begin{cases} a_0 = 1, b_0 = 0 \\ a_1 = 0, b_1 = 1 \\ \end{cases}\).
        • Then, we have
          • \(\mathbf{x}_t = a_t\mathbf{x}_0 + b_t,\quad \mathbf{v}_t(\mathbf{x}\mid\mathbf{x}_1) = b_t'\mathbf{x}_1 + \frac{a_t'}{a_t}(\mathbf{x} - b_t \mathbf{x}_1)\).
        • Thus,
          • \(\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{x}_1) = a_t'\mathbf{x}_0 + b_t'\mathbf{x}_1\).
      2. Two-Sided Conditioning : \(\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)\)
        • We may set $\Psi_t$ as
          \(\begin{cases} \mu_t(\mathbf{x}_0, \mathbf{x}_1) = b_t\mathbf{x}_1 \\ \mathbf{A}_t(\mathbf{x}_0, \mathbf{x}_1) = a_t \mathbf{I} \\ \end{cases}\).
        • Then, we have
          • \(\mathbf{x}_t = a_t\mathbf{x}_0 + b_t\mathbf{x}_1,\quad p_t(\cdot\mid\mathbf{x}_0, \mathbf{x}_1) = \delta(\cdot - (a_t\mathbf{x}_0 + b_t\mathbf{x}_1))\).
        • Thus,
          • \(\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{x}_0,\mathbf{x}_1) = a_t'\mathbf{x}_0 + b_t'\mathbf{x}_1\).

Model) Rectified Flow




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Flow Matching for Generative Modeling (CFM)
  • (DM Reconst.) Ch.3 Score-Based Perspective - From EBMs to NCSN
  • Flow Straight and Fast - Learning to Generate and Transfer Data with Rectified Flow (Rectified Flow)
  • Variational Autoencoder Bayes (VAE)
  • Variational Inference with Normalizing Flows