Generative Modeling via Drifting
Hozy Summary
Concept) Drifting Model for Generation
- Setting)
- \(f : \mathbb{R}^C\rightarrow\mathbb{R}^D\) : a neural network with
- input : \(\epsilon\sim p_\epsilon\)
- output : \(\mathbf{x} = f(\epsilon) \in\mathbb{R}^D\) s.t.
- \(\mathbf{x} = f(\epsilon)\sim q\).
- Desc.)
- Pushforward Relation between \(p\) and \(q\).
- \(q = f_{\#}p_\epsilon\).
- We want to find \(f\) s.t. \(f_{\#}p_\epsilon\approx p_{\text{data}}\)
- Pushforward Relation between \(p\) and \(q\).
- Drift \(\Delta\mathbf{x}\)
- Def.)
- Assume an iterative training process on \(f\)
- where
- \(\{f_i\}\) : the set of training process
- \(\{q_i\}\) for \(q_i = [f_i]_{\#}p_\epsilon\)
- where
- Then, the drift is the residual of \(\mathbf{x}_i = f_{i}(\epsilon)\)
- i.e.) \(\Delta\mathbf{x}_i := \mathbf{x}_{i+1}-\mathbf{x}_i = f_{i+1}(\epsilon)-f_{i}(\epsilon)\)
- Assume an iterative training process on \(f\)
- Def.)
- Drifting Field \(\mathbf{V}_{p,q}(\cdot)\)
- Def.)
- A function that computes the drift \(\Delta\mathbf{x}\) given \(\mathbf{x} = f(\epsilon)\) given by
- \(\mathbf{V}_{p,q} : \mathbb{R}^d\rightarrow\mathbb{R}^d\) s.t.
- \(\mathbf{V}_{p,q}(\mathbf{x}_{i}) = \Delta \mathbf{x}_{i} = \mathbf{x}_{i+1} - \mathbf{x}_{i}\).
- Or, \(\mathbf{x}_{i+1} = \mathbf{x}_{i} + \mathbf{V}_{p,q}(\mathbf{x}_{i})\)
- Here, \(p\) is the target distribution, e.g. \(p=p_{\text{data}}\)
- \(\mathbf{V}_{p,q} : \mathbb{R}^d\rightarrow\mathbb{R}^d\) s.t.
- A function that computes the drift \(\Delta\mathbf{x}\) given \(\mathbf{x} = f(\epsilon)\) given by
- Instantiation)
- Def.)
- \(f : \mathbb{R}^C\rightarrow\mathbb{R}^D\) : a neural network with
- Goal)
- Train \(f\) until all \(\mathbf{x}\) stop drifting
- i.e.) \(\mathbf{V}=\mathbf{0}\)
- Train \(f\) until all \(\mathbf{x}\) stop drifting
- Method)
- Anti-Symmetric Drifting Field
- Def.)
- \(\mathbf{V}_{p,q}(\mathbf{x}) = -\mathbf{V}_{q,p}(\mathbf{x}),\quad\forall\mathbf{x}\).
- Prop.)
- Then we have
- \(q=p\Rightarrow \mathbf{V}_{p,q}(\mathbf{x}) = \mathbf{0},\quad\forall\mathbf{x}\).
- Then we have
- Def.)
- Anti-Symmetric Drifting Field
- Training Objective)
- Data space prediction
- \(\mathcal{L} = \mathbb{E}_{\epsilon}\left[ \left\Vert \underbrace{f_\theta(\epsilon)}_{\text{pred}} - \underbrace{\text{stopgrad}\left( f_\theta(\epsilon) + \mathbf{V}_{p, q_\theta}(f_\theta(\epsilon)) \right)}_{\text{frozen target}} \right\Vert^2 \right]\).
- Recall that \(\mathbf{V}=\mathbf{0}\) was the terminating condition.
- \(\mathbf{V}\) is not the propagation target.
- Why?) \(\mathbf{V}\) depends on the distribution \(q_\theta\) which is not trivial.
- Instead, indirectly optimize via \(\mathbf{x}=f_\theta(\epsilon)\)
- \(\mathcal{L} = \mathbb{E}_{\epsilon}\left[ \left\Vert \underbrace{f_\theta(\epsilon)}_{\text{pred}} - \underbrace{\text{stopgrad}\left( f_\theta(\epsilon) + \mathbf{V}_{p, q_\theta}(f_\theta(\epsilon)) \right)}_{\text{frozen target}} \right\Vert^2 \right]\).
- Feature space prediction
- \(\mathcal{L} = \mathbb{E}\left[ \left\Vert \phi(\mathbf{x}) - \text{stopgrad}\left( \phi(\mathbf{x}) + \mathbf{V}(\phi(\mathbf{x})) \right) \right\Vert^2 \right]\).
- where
- \(\mathbf{x} = f_\theta(\epsilon)\).
- \(\mathbf{V}\) is defined on \(\{\phi(\mathbf{x})\mid\mathbf{x} = f_\theta(\epsilon), \epsilon~\sim\mathcal{N}(\mathbf{0,I})\}\).
- where
- \(\mathcal{L} = \mathbb{E}\left[ \left\Vert \phi(\mathbf{x}) - \text{stopgrad}\left( \phi(\mathbf{x}) + \mathbf{V}(\phi(\mathbf{x})) \right) \right\Vert^2 \right]\).
- Data space prediction
- Implementation Details)
- CFG
Tech.) Drift Field : Mean Shift Method of Attracting and Repulsing
- Model)
\(\begin{aligned} \mathbf{V}_{p,q}(\mathbf{x}) &= \mathbb{E}_{y^+\sim p} \mathbb{E}_{y^-\sim q} \left[ \mathcal{K}(x,y^+,y^-) \right] \\ &= \mathbf{V}_{p}^+(\mathbf{x}) - \mathbf{V}_{q}^-(\mathbf{x}) \\ &= \frac{1}{Z_p}\mathbb{E}_{p}\left[ k(\mathbf{x}, \mathbf{y}^+) (\mathbf{y}^+ - \mathbf{x}) \right] - \frac{1}{Z_q}\mathbb{E}_{q}\left[ k(\mathbf{x}, \mathbf{y}^-) (\mathbf{y}^- - \mathbf{x}) \right] \\ &= \frac{1}{Z_pZ_q}\mathbb{E}_{p,q}\left[ k(\mathbf{x}, \mathbf{y}^+) k(\mathbf{x}, \mathbf{y}^-) (\mathbf{y}^+ - \mathbf{y}^-) \right] \\ \end{aligned}\).- where
- \(\displaystyle k(\mathbf{x}, \mathbf{y}) = \exp\left(-\frac{1}{\tau}\Vert\mathbf{x}-\mathbf{y}\Vert\right)\) : a kernel function
- for
- \(\tau\) : the temperature
- \(\Vert\cdot\Vert\) : \(\ell_2\) distance
- for
- \(\mathbf{y}^+\sim p_{\text{data}}\).
- \(\mathbf{y}^-=f(\epsilon)\sim q\) where \(\epsilon~\sim\mathcal{N}(\mathbf{0,I})\).
- \(Z_p = \mathbb{E}_p\left[k(\mathbf{x}, \mathbf{y}^+)\right]\).
- \(Z_q = \mathbb{E}_q\left[k(\mathbf{x}, \mathbf{y}^-)\right]\).
- \(\displaystyle k(\mathbf{x}, \mathbf{y}) = \exp\left(-\frac{1}{\tau}\Vert\mathbf{x}-\mathbf{y}\Vert\right)\) : a kernel function
- where
- Desc.)
- \(p_{\text{data}}\) attracts the field.
- \(q\) repulses the field.
Algorithm)
Enjoy Reading This Article?
Here are some more articles you might like to read next: