(DM Reconst.) Ch.5 Flow-Based Perspective - From NFs to Flow Matching
Diffusion Model Conceptual Reconstruction following The Principles of Diffusion Models
Hozy Summary
- Normalizing Flows (NF)
- Idea)
- Use sequence of invertible transformation from $p_{\text{data}}$ to $p_{\text{prior}}$.
- cf.) Change-of-Variables Formula : \(p(\mathbf{x}) = p_{\text{prior}}(\mathbf{z})\displaystyle\left\vert\text{det}\frac{\partial \mathbf{f}^-1(\mathbf{x})}{\partial \mathbf{x}}\right\vert,\quad\text{where } \mathbf{z}=\mathbf{f}^{-1}(\mathbf{x})\).
- Use sequence of invertible transformation from $p_{\text{data}}$ to $p_{\text{prior}}$.
- Loss)
\(\begin{aligned} \mathcal{L}_{\text{NF}}(\phi) &= \mathbb{E}_{\mathbf{x}\sim p_{\text{data}}}\left[\log_\phi(\mathbf{x})\right] \\ &= \mathbb{E}_{\mathbf{x}\sim p_{\text{data}}}\left[ \log p_{\text{prior}}(\mathbf{z}) + \log\left\vert\text{det}\frac{\partial f_\phi^{-1}(\mathbf{x})}{\partial\mathbf{x}}\right\vert \right] \\ \end{aligned}\). - Sampling)
- Draw \(\mathbf{x}_0\sim p_{\text{prior}}\).
- Compute \(\mathbf{x} = \mathbf{f}_\phi(\mathbf{x}_0)\).
- Limit)
- Limited choice of transformation.
- Trade off between the expressiveness of the model and the computational cost.
- Must calculate the determinant of the Jacobian $O(D^3)$
- Idea)
- Neural ODE (NODE)
- Idea)
- Continuous transformation adopted to the NFs
- \(\displaystyle\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{\phi}(\mathbf{x}_{t}, t),\quad t\in[0,T]\).
- Continuous transformation adopted to the NFs
- Loss)
- MLE of \(\mathcal{L}_\text{NODE}(\phi) := \mathbb{E}_{\mathbf{x}\sim p_\text{data}}\left[ \log p_\phi(\mathbf{x},T) \right]\)
- Sampling)
- Draw \(\mathbf{x}(0)\sim p_{\text{prior}}\).
- Compute \(\displaystyle\mathbf{x}(T) = \mathbf{x}(0) + \int_0^T \mathbf{v}_{\phi^\times}(\mathbf{x}_{t}, t)\text{d}t\).
- Limit)
- ODE solver is expensive due to the integration.
- Idea)
- Flow Matching
- Idea)
- Relationship between \(\mathbf{x}_t,\;p_t,\;\Psi_{0\rightarrow t}\text{, and }\mathbf{v}_t\)
- \(\mathbf{x}_t\sim p_t\).
- \(\Psi_{0\rightarrow t}(\mathbf{x}) = \mathbf{x}_t\).
- \(\frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}(\mathbf{x}) = \mathbf{v}_t(\mathbf{x})\).
- \(\mathbf{v}_t(\mathbf{x})\)-prediction
- i.e.) Parameterize the velocity field, i.e. \(\mathbf{v}_\phi \approx \mathbf{v}_t\).
- Similarity with the NODE.
- Since \(\mathbf{v}_t(\mathbf{x})\) is intractable, use conditional density, conditional flow, and conditional velocity field instead
- i.e.) \(\mathbf{v}_\phi(\cdot\mid\mathbf{z}) \approx \mathbf{v}_t(\cdot\mid\mathbf{z})\).
- Relationship between \(\mathbf{x}_t,\;p_t,\;\Psi_{0\rightarrow t}\text{, and }\mathbf{v}_t\)
- Training)
- Loss)
- \(\mathcal{L}_{\text{CFM}}(\phi) = \mathbb{E}_{t,\mathbf{x}\sim\pi(\mathbf{z}),\mathbf{x}_t\sim p_t(\cdot\mid\mathbf{z})}\left[ \Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - \mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z}) \Vert^2 \right] + C\).
- Implementation)
- Obtain \(\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z})\) by…
- setting $p_t(\cdot\mid\mathbf{z})$ (conditional probability path) first : Eulerian View
- setting $\Psi_{0\rightarrow t}(\cdot\mid\mathbf{z})$ (conditional flow) first : Lagrangian View
- We may choose the latent $\mathbf{z}$ by…
- Two-Sided Conditioning : \(\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)\)
- One-Sided Conditioning : \(\mathbf{z} = \mathbf{x}_0\) or \(\mathbf{z} = \mathbf{x}_1\)
- Obtain \(\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z})\) by…
- Loss)
- Idea)
Prop.) Change-of-Variables Formula of Densities
- Prop.)
- For
- $\mathbf{f}$ : an invertible transformation
- $\mathbf{z}\sim p_{\text{prior}}$
- the density of $\mathbf{x} = \mathbf{f}(\mathbf{z})$ is
- \(p(\mathbf{x}) = p_{\text{prior}}(\mathbf{z})\displaystyle\left\vert\text{det}\frac{\partial \mathbf{f}^-1(\mathbf{x})}{\partial \mathbf{x}}\right\vert,\quad\text{where } \mathbf{z}=\mathbf{f}^{-1}(\mathbf{x})\).
- For
Intuition hozy note
5.1 Flow-Based Models: Normalizing Flows and Neural ODEs
Model) Normalizing Flows
Rezende and Mohamed, 2015
hozy note
- Def.)
- For
- \(p_{\text{data}}(\mathbf{x})\) : a complex data distribution
- \(p_{\text{prior}}(\mathbf{z})\) : a simple prior
- \(\mathbf{f}_\phi:\mathbb{R}^D\rightarrow\mathbb{R}^D\) with
- \(\mathbf{x} = f_\phi(\mathbf{z})\) and \(\mathbf{z}\sim p_{\text{prior}}\)
- Using the change-of-variables formula, we may get the model likelihood of
- \(\log p_\phi(\mathbf{x}) = \displaystyle\log p_{\text{prior}}(\mathbf{z}) + \log\left\vert\text{det}\frac{\partial f_\phi^{-1}(\mathbf{x})}{\partial\mathbf{x}}\right\vert\).
- For
- Training Objective)
- We may learn parameters $\phi$ by maximizing the log-likelihood, which is equivalent to minimizing
- \(\mathcal{L}_{\text{NF}}(\phi) = \mathbb{E}_{\mathbf{x}\sim p_{\text{data}}}\left[\log_\phi(\mathbf{x})\right]\).
- We may learn parameters $\phi$ by maximizing the log-likelihood, which is equivalent to minimizing
- Limit)
- Simple linear transformation lacks expressiveness
- Sol.) A sequence of $K$ trainable invertible mappings \(\{\mathbf{f}_k\}_{k=0}^{L-1}\)
- Settings)
- \(\mathbf{f}_\phi = \mathbf{f}_{L-1}\circ\mathbf{f}_{L-2}\circ\cdots\circ\mathbf{f}_{0}\) where each \(\mathbf{f}_{k}\) is parameterized by a neural network
- \(\mathbf{x}_{k+1} = \mathbf{f}_{k}(\mathbf{x}_{k}),\quad k=0,\ldots,L-1, \mathbf{z}=\mathbf{x}_{0}\sim p_{\text{prior}}, \mathbf{x}=\mathbf{x}_{L}\).
- Then we get the log-likelihood of
- \(\log p_\phi(\mathbf{x}) = \log p_{\text{prior}}(\mathbf{x}_{0}) + \displaystyle\sum_{k=0}^{L-1}\log\left\vert\text{det}\frac{\partial \mathbf{f}_{k}}{\partial \mathbf{x}_{k}}\right\vert^{-1}\).
- Settings)
- Sol.) A sequence of $K$ trainable invertible mappings \(\{\mathbf{f}_k\}_{k=0}^{L-1}\)
- Optimizing the above loss takes \(O(D^3)\) runtime due to computing the Jacobian determinant.
- Sol.) Using the Planar Flows or Residual Flows
- Simple linear transformation lacks expressiveness
- Sampling)
- Draw \(\mathbf{x}_0\sim p_{\text{prior}}\).
- Compute \(\mathbf{x} = \mathbf{f}_\phi(\mathbf{x}_0)\).
Concept) Residual Flow
Model) Neural ODEs
- Idea)
- Recall the discrete sequence of invertible mappings of Normalizing Flows of \(\mathbf{x}_{k+1} = \mathbf{f}_{k}(\mathbf{x}_{k})\).
- Using the parameterized velocity field, we may reconstruct it as
- \(\mathbf{x}_{k+1} = \mathbf{x}_{k} + \mathbf{v}_{\phi_k}(\mathbf{x}_{k}, k)\).
- This formulation corresponds to the Euler discretization of the continuous time ODE of \(\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{\phi}(\mathbf{x}_{t}, t)\).
- In the limit of infinite layers and vanishing step size ($\Delta t\rightarrow0$), the discrete NF converges to a continuous model : Nueral ODE
- Model)
- A continuous transformation
- \(\displaystyle\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{\phi}(\mathbf{x}_{t}, t),\quad t\in[0,T]\).
- where
- \(\mathbf{x}_{t}\in\mathbb{R}^D\) : the state at time $t$
- \(\mathbf{v}_{\phi}(\mathbf{x}_{t}, t)\) : a neural network parameterized by $\phi$
- where
- \(\displaystyle\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{\phi}(\mathbf{x}_{t}, t),\quad t\in[0,T]\).
- Using the Instantaneous Change-of-Variables Formula below, we may get the log density of $x_T$ by the neural ODE given by
- \(\displaystyle \log p_\phi(\mathbf{x}_{T}, T) = \log p_{\text{prior}}(\mathbf{x}_{0}, 0) -\int_0^T \nabla_{\mathbf{x}} \mathbf{v}_{\phi}(\mathbf{x}_{t}, t)\text{d}t\).
- A continuous transformation
- Training)
- Goal) \(p_\phi(\cdot, T) \approx p_{\text{data}}\)
- Loss)
- MLE of \(\mathcal{L}_\text{NODE}(\phi) := \mathbb{E}_{\mathbf{x}\sim p_\text{data}}\left[ \log p_\phi(\mathbf{x},T) \right]\)
- Limit)
- ODE solver is expensive due to the integration.
- The adjoint sensitivity method computes gradients via an auxiliary ODE with $O(1)$ memory complexity.
- Sampling)
- Draw \(\mathbf{x}(0)\sim p_{\text{prior}}\).
- Compute \(\displaystyle\mathbf{x}(T) = \mathbf{x}(0) + \int_0^T \mathbf{v}_{\phi^\times}(\mathbf{x}_{t}, t)\text{d}t\).
Concept) Instantaneous Change-of-Variables Formula
- Thm.)
- \(\displaystyle\frac{\text{d}}{\text{d}t} \log p_\phi(\mathbf{x}_{t}, t) = -\nabla_{\mathbf{x}} \mathbf{v}_{\phi}(\mathbf{x}_{t}, t)\).
- cf.)
- A continuous version of Change-of-Variables Formula
- Chen et al., 2018
- A special case of the Fokker-Planck equation
Desc.) Physical Interpretation of the Continuity Equation
From Appendix B
5.2. Flow Matching Framework
- Settings)
- \(\{p_t\}_{t\in[0,1]}\) : a predefined probability path s.t.
- Boundary settings
- \(p_0 = p_{\text{src}},\quad p_1 = p_{\text{tgt}}\).
- Marginal Density
- For a latent variable $\mathbf{z}\sim\pi(\mathbf{z})$ drawn from an unknown distribution
- Here, the common choices for $\mathbf{z}$ are
- Two-sided conditioning : \(\mathbf{z}=(\mathbf{x}_0, \mathbf{x}_1)\sim p_{\text{src}}(\mathbf{x}_0) p_{\text{tgt}}(\mathbf{x}_1)\)
- One-sided conditioning : \(\mathbf{z} = \mathbf{x}_0\) or \(\mathbf{z} = \mathbf{x}_1\)
- Here, the common choices for $\mathbf{z}$ are
- we may denote the marginal density $p_t(\mathbf{x}_t)$ as
- \(p_t(\mathbf{x}_t) = \displaystyle\int p_t(\mathbf{x}_t\mid\mathbf{z})\pi(\mathbf{z})\text{d}\mathbf{z}\) with $(\pi(\mathbf{z}), {p_t(\cdot\mid\mathbf{z})})$ chosen to satisfy the boundary conditions.
- For a latent variable $\mathbf{z}\sim\pi(\mathbf{z})$ drawn from an unknown distribution
- Boundary settings
- \(\mathbf{v}_t(\mathbf{x}_t)\) : a time-dependent vector(velocity) field whose associated ODE flow matches \(\{p_t\}_{t\in[0,1]}\)
- s.t. induced ODE enables a sample-wise transformation of
- \(\displaystyle\frac{\text{d}\mathbf{x}_{t}}{\text{d}t} = \mathbf{v}_{t}(\mathbf{x}(t)),\quad t\in[0,1]\).
- \(\mathbf{x}(t)\sim p_t\).
- or, equivalently captured with the Continuity Equation of
- \(\displaystyle\frac{\partial p_t(\mathbf{x})}{\partial t} + \nabla\cdot(\mathbf{v}_t(\mathbf{x})p_t(\mathbf{x})) = 0\).
- Any \(\mathbf{v}_t\) that satisfies the above Continuity Equation is allowed!
- s.t. induced ODE enables a sample-wise transformation of
- \(\{p_t\}_{t\in[0,1]}\) : a predefined probability path s.t.
- Training)
- $\mathbf{v}$-prediction
- \(\mathcal{L}_{\text{FM}}(\phi) = \mathbb{E}_{t,\mathbf{x}_t\sim p_t}\left[ \Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - \mathbf{v}_t(\mathbf{x}_t) \Vert^2 \right]\).
- However, $\mathbf{v}_t(\mathbf{x}_t, t)$ is intractable.
- Just like DDPM and DSM did, we may utilize the conditional velocity field of $\mathbf{v}_t(\mathbf{x}\mid\mathbf{z})$ as follows:
\(\begin{aligned} \mathcal{L}_{\text{FM}}(\phi) &= \mathbb{E}_{t,\mathbf{x}\sim\pi(\mathbf{z}),\mathbf{x}_t\sim p_t(\cdot\mid\mathbf{z})}\left[ \Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - \mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z}) \Vert^2 \right] + C \\ &\triangleq \mathcal{L}_{\text{CFM}}(\phi) + C \\ \end{aligned}\)
- Just like DDPM and DSM did, we may utilize the conditional velocity field of $\mathbf{v}_t(\mathbf{x}\mid\mathbf{z})$ as follows:
- To obtain \(\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{z})\), we may choose either
- to set $p_t(\cdot\mid\mathbf{z})$ (conditional probability path) first : Eulerian View
- to set $\Psi_{0\rightarrow t}(\cdot\mid\mathbf{z})$ (conditional flow) first : Lagrangian View
- $\mathbf{v}$-prediction
Concept) Eulerian View
- Goal)
- Construct $p_t(\cdot\mid\mathbf{z})$ first and then derive the corresponding $\mathbf{v}_t(\cdot\mid\mathbf{z})$.
- E.g.) Both source and target distribution are Gaussian.
- Procedure)
- Identify the marginal density \(p_t(\mathbf{x}_t)\).
- Choose a marginal flow map \(\Psi_{0\rightarrow t}(\mathbf{x})\).
- Derive the marginal velocity field \(\mathbf{v}_t(\mathbf{x})\) from \(\Psi_{0\rightarrow t}(\mathbf{x})\).
- Show that the closed-form characterization on the marginal is valid on the conditional as well.
- Two strategies for choosing the $\mathbf{z}$ from $p_t(\cdot\mid\mathbf{z})$
- Two-Sided Conditioning : \(\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)\)
- One-Sided Conditioning : \(\mathbf{z} = \mathbf{x}_0\) or \(\mathbf{z} = \mathbf{x}_1\)
- Two strategies for choosing the $\mathbf{z}$ from $p_t(\cdot\mid\mathbf{z})$
- Derivation)
- Identifying the marginal \(p_t(\mathbf{x}_t)\).
- Boundary conditions
- \(p_{\text{src}} = p_0 = \mathcal{N}\left(\mathbf{x};\;\mu(0),\sigma^2(0)\mathbf{I}\right)\).
- \(p_{\text{tgt}} = p_1 = \mathcal{N}\left(\mathbf{x};\;\mu(1),\sigma^2(1)\mathbf{I}\right)\).
- Then the marginal density can be denoted as the interpolation of
- \(p_t(\mathbf{x}_t) = \mathcal{N}\left(\mathbf{x}_t;\;\mu(t),\sigma^2(t)\mathbf{I}\right)\).
- And, \(\{p_t\}_{t\in[0,1]}\) connects \(p_{\text{src}}\) and \(p_{\text{tgt}}\).
- Boundary conditions
- Choosing the marginal flow map \(\Psi_{0\rightarrow t}(\mathbf{x})\).
- There are many velocity fields that induce an ODE flow s.t. $\mathbf{x}\sim p_0$ implies \(\Psi_{0\rightarrow t}(\mathbf{x})\sim p_t\).
- We may choose a flow map s.t.
- \(\displaystyle\Psi_{0\rightarrow t}(\mathbf{x}) := \mu(t) + \sigma(t)\left(\frac{\mathbf{x}-\mu(0)}{\sigma(0)}\right)\).
- Since the flow map should be invertible, we have
- \(\forall\mathbf{y} = \Psi_{0\rightarrow t}(\mathbf{x}),\quad\exists\Psi_{0\rightarrow t}^{-1}:\mathbb{R}^D\rightarrow\mathbb{R}^D\text{ s.t. }\mathbf{x} = \Psi_{0\rightarrow t}^{-1}(\mathbf{y}) = \Psi_{t\rightarrow 0}(\mathbf{y})\).
- Deriving the velocity field.
- \(\displaystyle\mathbf{v}_t(\mathbf{x}) = \frac{\sigma'(t)}{\sigma(t)}(\mathbf{x}-\mu(t)) + \mu'(t)\).
- i.e.) \(\mathbf{v}_t = \partial_t \Psi_{0\rightarrow t} \circ \Psi^{-1}_{0\rightarrow t}\).
- Why?)
- Putting $\mathbf{x}_0$ be the initial input, we may get
- \(\displaystyle\Psi_{0\rightarrow t}(\mathbf{x}_0) = \mu(t) + \sigma(t)\left(\frac{\mathbf{x}_0-\mu(0)}{\sigma(0)}\right)\quad\cdots\quad(A)\).
- \(\displaystyle\frac{\text{d}}{\text{d}t} (\Psi_{0\rightarrow t}(\mathbf{x}_0)) = \mathbf{v}_t (\Psi_{0\rightarrow t}(\mathbf{x}_0))\quad\cdots\quad(B)\).
- For some $\hat{\mathbf{x}}$,
- we may rewrite (A) as
\(\begin{aligned} \mathbf{x}_0 &= \Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})&\cdots\quad(C1)&\quad(\because\text{Invertibility}) \\ &= \mu(0) + \sigma(0)\left(\frac{\hat{\mathbf{x}}-\mu(t)}{\sigma(t)}\right) &\cdots\quad(C2) \end{aligned}\).- which is equivalent to \(\displaystyle\frac{\mathbf{x}_0-\mu(0)}{\sigma(0)} = \frac{\hat{\mathbf{x}}-\mu(t)}{\sigma(t)}\quad\cdots(C3)\)
- we may rewrite (B) as
- \(\displaystyle\frac{\text{d}}{\text{d}t} (\Psi_{0\rightarrow t}(\mathbf{x}_0)) = \mathbf{v}_t (\hat{\mathbf{x}})\quad\cdots\quad(D)\).
- we may rewrite (A) as
- Plugging (C1) into (D), we may get
- \(\displaystyle\frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}(\Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})) = \mathbf{v}_t(\hat{\mathbf{x}})\quad\cdots\quad(E)\).
- From our definition, we may also get
- \(\displaystyle \frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}(\mathbf{x}) = \mu'(t) + \sigma'(t)\left(\frac{\mathbf{x}-\mu(0)}{\sigma(0)}\right)\) given $\mathbf{x}\quad\cdots(F)$
- Replacing \(\mathbf{x}\) in (F) with \(\mathbf{x}_0 = \Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})\) from (C1,2), we get \(\begin{aligned} \frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}\bigg(\overbrace{\Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})}^{=\mathbf{x}_0}\bigg) &= \mu'(t) + \sigma'(t)\Big(\frac{\overbrace{\mathbf{x}_0}^{=\Psi_{0\rightarrow t}^{-1}(\hat{\mathbf{x}})}-\mu(0)}{\sigma(0)}\Big) & (\text{From } (C1,2)) \\ &= \mu'(t) + \sigma'(t)\Big(\frac{\hat{\mathbf{x}}-\mu(t)}{\sigma(t)}\Big) & (\because (C3)) \\ &= \frac{\sigma'(t)}{\sigma(t)}(\hat{\mathbf{x}}-\mu(t)) + \mu'(t) \\ &= \mathbf{v}_t(\hat{\mathbf{x}}) & (\because (E)) \end{aligned}\).
- Again replacing $\hat{\mathbf{x}}$ with $\mathbf{x}$ we get
- \(\displaystyle\mathbf{v}_t(\mathbf{x}) = \frac{\sigma'(t)}{\sigma(t)}(\mathbf{x}-\mu(t)) + \mu'(t)\).
- Putting $\mathbf{x}_0$ be the initial input, we may get
- \(\displaystyle\mathbf{v}_t(\mathbf{x}) = \frac{\sigma'(t)}{\sigma(t)}(\mathbf{x}-\mu(t)) + \mu'(t)\).
- Showing the closed-form in marginal also works in conditional.
- Two strategies for choosing the $\mathbf{z}$ from $p_t(\cdot\mid\mathbf{z})$
- Two-Sided Conditioning : \(\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)\)
- Settings)
- \(\mathbf{z}\sim \pi(\mathbf{z}) := p_{\text{src}}(\mathbf{x}_0) p_{\text{tgt}}(\mathbf{x}_1)\).
- \(p_t(\cdot\mid\mathbf{z}=(\mathbf{x}_0,\mathbf{x}_1)) := \mathcal{N}(\mathbf{x}_t;\; a_t \mathbf{x}_0 + b_t\mathbf{x}_1, \sigma^2\mathbf{I})\).
- where $a_t+b_t = 1$, i.e. the linear interpolation between \(\mathbf{x}_0, \mathbf{x}_1\)
- Deriving $\mathbf{v}_t(\cdot\mid\mathbf{z})$
- Then applying the derivation of $\mathbf{v}_t$ in marginal, we may get
- \(\mathbf{v}_t(\mathbf{x}\mid\mathbf{z}) = a_t' \mathbf{x}_0 + b_t' \mathbf{x}_1\).
- Then applying the derivation of $\mathbf{v}_t$ in marginal, we may get
- CFM Loss
- \(\mathcal{L}_{\text{CFM}} = \mathbb{E}_{t, \mathbf{x}_0\sim p_{\text{src}}, \mathbf{x}_1\sim p_{\text{tgt}}}\left\Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - (a_t' \mathbf{x}_0 + b_t' \mathbf{x}_1) \right\Vert^2\).
- Settings)
- One-Sided Conditioning : Consider the case of \(\mathbf{z} = \mathbf{x}_1\)
- Settings)
- \(\mathbf{z}\sim \pi(\mathbf{z}) := p_{\text{data}}(\mathbf{x}_1)\).
- \(p_t(\cdot\mid\mathbf{z}=\mathbf{x}_1) := \mathcal{N}(\mathbf{x}_t;\; b_t \mathbf{x}_1, a_t^2\mathbf{I})\).
- where \(\begin{cases} a_0 = 1, b_0 = 0 \\ a_1 = 0, b_1 = 1 \\ \end{cases} \Leftrightarrow \begin{cases} p_0(\cdot\mid\mathbf{z}=\mathbf{x}_1) = \mathcal{N}(\cdot;\; \mathbf{0,I}) \\ p_1(\cdot\mid\mathbf{z}=\mathbf{x}_1) = delta(\cdot - \mathbf{x}_1) \\ a_1 = 0, b_1 = 1 \\ \end{cases}\)
- Deriving $\mathbf{v}_t(\cdot\mid\mathbf{z})$
- Then applying the derivation of $\mathbf{v}_t$ in marginal, we may get
- \(\mathbf{v}_t(\mathbf{x}\mid\mathbf{x}_1) = b_t'\mathbf{x}_1 + \frac{a_t'}{a_t}(\mathbf{x} - b_t\mathbf{x}_1)\).
- Then applying the derivation of $\mathbf{v}_t$ in marginal, we may get
- CFM Loss
- \(\mathcal{L}_{\text{CFM}} = \mathbb{E}_{t, \mathbf{x}_0\sim p_{\text{src}}, \mathbf{x}_1\sim p_{\text{tgt}}}\left\Vert \mathbf{v}_\phi(\mathbf{x}_t, t) - \left[ b_t'\mathbf{x}_1 + \frac{a_t'}{a_t}(\mathbf{x} - b_t\mathbf{x}_1) \right] \right\Vert^2\).
- Settings)
- Two-Sided Conditioning : \(\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)\)
- Two strategies for choosing the $\mathbf{z}$ from $p_t(\cdot\mid\mathbf{z})$
- Identifying the marginal \(p_t(\mathbf{x}_t)\).
- Procedure)
Concept) Lagrangian View
- Goal)
- Design the conditional flow map $\Psi_{0\rightarrow t}(\cdot;\mathbf{z})$ first and directly obtain the corresponding $\mathbf{v}_t(\cdot\mid\mathbf{z})$ by differentiating along particle trajectories, i.e. \(\frac{\text{d}}{\text{d}t}\Psi_{0\rightarrow t}(\mathbf{y};\mathbf{z}) = \mathbf{v}_t\Big(\underbrace{\Psi_{0\rightarrow t}(\mathbf{y};\mathbf{z})}_{\mathbf{x}}\mid\mathbf{z}\Big)\)
- e.g.) Conditional Affine Flow
- Settings)
- \(\mathbf{z}\sim\pi\) : a latent variable
- which can be (but not limited to) either…
- Two-Sided Conditioning : \(\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)\)
- One-Sided Conditioning : \(\mathbf{z} = \mathbf{x}_0\) or \(\mathbf{z} = \mathbf{x}_1\)
- Refer to Eulerian View for more details.
- which can be (but not limited to) either…
- \(\Psi_{0\rightarrow t}(\mathbf{x}_0;\mathbf{z}) := \mu_t(\mathbf{z}) + \mathbf{A}_t(\mathbf{z})\mathbf{x}_0,\quad t\in[0,1]\) : the time-varying conditional affine flow
- where
- \(\mu_t(\mathbf{z})\in\mathbb{R}^D\).
- cf.) \(\mu_0(\mathbf{z}) = \mathbf{0}\)
- \(\mathbf{A}_t(\mathbf{z})\in\mathbb{R}^{D\times D}\) is invertible for \(t\in[0,1]\)
- cf.) \(\mathbf{A}_0(\mathbf{z}) = \mathbf{I}\)
- \(\mu_t(\mathbf{z})\in\mathbb{R}^D\).
- where
- \(\mathbf{z}\sim\pi\) : a latent variable
- Induce the Conditional Path \(p_t(\cdot\mid\mathbf{z})\) :
- \(p_t(\cdot\mid\mathbf{z}) = [\Psi_t(\cdot;\mathbf{z})]_\# p_0\).
- cf.) \(\#\) means pushing the density forward which is equivalent to the change-of-variables
- i.e.) \(p_t(\mathbf{x}_t) = p_0(\Psi_t^{-1}(\mathbf{x}_t)) \left\vert \det \nabla_{\mathbf{x}_t} \Psi_t^{-1}(\mathbf{x}_t) \right\vert\)
- Prop.)
- Since $\mathbf{A}$ is affine, if $p_0$ is Gaussian, then $p_t$ is Gaussian as well.
- cf.) \(\#\) means pushing the density forward which is equivalent to the change-of-variables
- \(p_t(\cdot) = \displaystyle\int p_t(\cdot\mid\mathbf{z})\pi(\mathbf{z})\text{d}\mathbf{z}\).
- \(p_t(\cdot\mid\mathbf{z}) = [\Psi_t(\cdot;\mathbf{z})]_\# p_0\).
- Derive the Conditional Velocity \(\mathbf{v}_t(\cdot\mid\mathbf{z})\) :
- Applying the derivation result \(\displaystyle\mathbf{v}_t(\mathbf{x}) = \frac{\sigma'(t)}{\sigma(t)}(\mathbf{x}-\mu(t)) + \mu'(t)\) from the Eulerian View, we may get
- \(\mathbf{v}(\mathbf{x}\mid\mathbf{z}) = \mu_t'(\mathbf{z}) + \mathbf{A}_t'(\mathbf{z})\mathbf{A}_t(\mathbf{z})^{-1}(\mathbf{x}-\mu_t(\mathbf{z}))\).
- Applying the derivation result \(\displaystyle\mathbf{v}_t(\mathbf{x}) = \frac{\sigma'(t)}{\sigma(t)}(\mathbf{x}-\mu(t)) + \mu'(t)\) from the Eulerian View, we may get
- Depending on the conditioning strategy…
- One-Sided Conditioning : \(\mathbf{z} = \mathbf{x}_0\) or \(\mathbf{z} = \mathbf{x}_1\)
- We may set $\Psi_t$ as
\(\begin{cases} \mu_t(\mathbf{z}) = b_t\mathbf{z} \\ \mathbf{A}_t(\mathbf{z}) = a_t \mathbf{I} \\ \end{cases},\quad \begin{cases} a_0 = 1, b_0 = 0 \\ a_1 = 0, b_1 = 1 \\ \end{cases}\). - Then, we have
- \(\mathbf{x}_t = a_t\mathbf{x}_0 + b_t,\quad \mathbf{v}_t(\mathbf{x}\mid\mathbf{x}_1) = b_t'\mathbf{x}_1 + \frac{a_t'}{a_t}(\mathbf{x} - b_t \mathbf{x}_1)\).
- Thus,
- \(\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{x}_1) = a_t'\mathbf{x}_0 + b_t'\mathbf{x}_1\).
- We may set $\Psi_t$ as
- Two-Sided Conditioning : \(\mathbf{z} = (\mathbf{x}_0, \mathbf{x}_1)\)
- We may set $\Psi_t$ as
\(\begin{cases} \mu_t(\mathbf{x}_0, \mathbf{x}_1) = b_t\mathbf{x}_1 \\ \mathbf{A}_t(\mathbf{x}_0, \mathbf{x}_1) = a_t \mathbf{I} \\ \end{cases}\). - Then, we have
- \(\mathbf{x}_t = a_t\mathbf{x}_0 + b_t\mathbf{x}_1,\quad p_t(\cdot\mid\mathbf{x}_0, \mathbf{x}_1) = \delta(\cdot - (a_t\mathbf{x}_0 + b_t\mathbf{x}_1))\).
- Thus,
- \(\mathbf{v}_t(\mathbf{x}_t\mid\mathbf{x}_0,\mathbf{x}_1) = a_t'\mathbf{x}_0 + b_t'\mathbf{x}_1\).
- We may set $\Psi_t$ as
- One-Sided Conditioning : \(\mathbf{z} = \mathbf{x}_0\) or \(\mathbf{z} = \mathbf{x}_1\)
- Settings)
Model) Rectified Flow
- Refer to the previous note on RF
Enjoy Reading This Article?
Here are some more articles you might like to read next: