Generative Modeling via Drifting

Deng et al. 2026



Hozy Summary




Concept) Drifting Model for Generation

  • Setting)
    • \(f : \mathbb{R}^C\rightarrow\mathbb{R}^D\) : a neural network with
      • input : \(\epsilon\sim p_\epsilon\)
      • output : \(\mathbf{x} = f(\epsilon) \in\mathbb{R}^D\) s.t.
        • \(\mathbf{x} = f(\epsilon)\sim q\).
      • Desc.)
        • Pushforward Relation between \(p\) and \(q\).
          • \(q = f_{\#}p_\epsilon\).
        • We want to find \(f\) s.t. \(f_{\#}p_\epsilon\approx p_{\text{data}}\)
    • Drift \(\Delta\mathbf{x}\)
      • Def.)
        • Assume an iterative training process on \(f\)
          • where
            • \(\{f_i\}\) : the set of training process
            • \(\{q_i\}\) for \(q_i = [f_i]_{\#}p_\epsilon\)
        • Then, the drift is the residual of \(\mathbf{x}_i = f_{i}(\epsilon)\)
          • i.e.) \(\Delta\mathbf{x}_i := \mathbf{x}_{i+1}-\mathbf{x}_i = f_{i+1}(\epsilon)-f_{i}(\epsilon)\)
    • Drifting Field \(\mathbf{V}_{p,q}(\cdot)\)
      • Def.)
        • A function that computes the drift \(\Delta\mathbf{x}\) given \(\mathbf{x} = f(\epsilon)\) given by
          • \(\mathbf{V}_{p,q} : \mathbb{R}^d\rightarrow\mathbb{R}^d\) s.t.
            • \(\mathbf{V}_{p,q}(\mathbf{x}_{i}) = \Delta \mathbf{x}_{i} = \mathbf{x}_{i+1} - \mathbf{x}_{i}\).
            • Or, \(\mathbf{x}_{i+1} = \mathbf{x}_{i} + \mathbf{V}_{p,q}(\mathbf{x}_{i})\)
          • Here, \(p\) is the target distribution, e.g. \(p=p_{\text{data}}\)
      • Instantiation)
  • Goal)
    • Train \(f\) until all \(\mathbf{x}\) stop drifting
      • i.e.) \(\mathbf{V}=\mathbf{0}\)
  • Method)
    • Anti-Symmetric Drifting Field
      • Def.)
        • \(\mathbf{V}_{p,q}(\mathbf{x}) = -\mathbf{V}_{q,p}(\mathbf{x}),\quad\forall\mathbf{x}\).
      • Prop.)
        • Then we have
          • \(q=p\Rightarrow \mathbf{V}_{p,q}(\mathbf{x}) = \mathbf{0},\quad\forall\mathbf{x}\).
  • Training Objective)
    • Data space prediction
      • \(\mathcal{L} = \mathbb{E}_{\epsilon}\left[ \left\Vert \underbrace{f_\theta(\epsilon)}_{\text{pred}} - \underbrace{\text{stopgrad}\left( f_\theta(\epsilon) + \mathbf{V}_{p, q_\theta}(f_\theta(\epsilon)) \right)}_{\text{frozen target}} \right\Vert^2 \right]\).
        • Recall that \(\mathbf{V}=\mathbf{0}\) was the terminating condition.
        • \(\mathbf{V}\) is not the propagation target.
          • Why?) \(\mathbf{V}\) depends on the distribution \(q_\theta\) which is not trivial.
          • Instead, indirectly optimize via \(\mathbf{x}=f_\theta(\epsilon)\)
    • Feature space prediction
      • \(\mathcal{L} = \mathbb{E}\left[ \left\Vert \phi(\mathbf{x}) - \text{stopgrad}\left( \phi(\mathbf{x}) + \mathbf{V}(\phi(\mathbf{x})) \right) \right\Vert^2 \right]\).
        • where
          • \(\mathbf{x} = f_\theta(\epsilon)\).
          • \(\mathbf{V}\) is defined on \(\{\phi(\mathbf{x})\mid\mathbf{x} = f_\theta(\epsilon), \epsilon~\sim\mathcal{N}(\mathbf{0,I})\}\).
  • Implementation Details)
    • CFG


Tech.) Drift Field : Mean Shift Method of Attracting and Repulsing

  • Model)
    \(\begin{aligned} \mathbf{V}_{p,q}(\mathbf{x}) &= \mathbb{E}_{y^+\sim p} \mathbb{E}_{y^-\sim q} \left[ \mathcal{K}(x,y^+,y^-) \right] \\ &= \mathbf{V}_{p}^+(\mathbf{x}) - \mathbf{V}_{q}^-(\mathbf{x}) \\ &= \frac{1}{Z_p}\mathbb{E}_{p}\left[ k(\mathbf{x}, \mathbf{y}^+) (\mathbf{y}^+ - \mathbf{x}) \right] - \frac{1}{Z_q}\mathbb{E}_{q}\left[ k(\mathbf{x}, \mathbf{y}^-) (\mathbf{y}^- - \mathbf{x}) \right] \\ &= \frac{1}{Z_pZ_q}\mathbb{E}_{p,q}\left[ k(\mathbf{x}, \mathbf{y}^+) k(\mathbf{x}, \mathbf{y}^-) (\mathbf{y}^+ - \mathbf{y}^-) \right] \\ \end{aligned}\).
    • where
      • \(\displaystyle k(\mathbf{x}, \mathbf{y}) = \exp\left(-\frac{1}{\tau}\Vert\mathbf{x}-\mathbf{y}\Vert\right)\) : a kernel function
        • for
          • \(\tau\) : the temperature
          • \(\Vert\cdot\Vert\) : \(\ell_2\) distance
      • \(\mathbf{y}^+\sim p_{\text{data}}\).
      • \(\mathbf{y}^-=f(\epsilon)\sim q\) where \(\epsilon~\sim\mathcal{N}(\mathbf{0,I})\).
      • \(Z_p = \mathbb{E}_p\left[k(\mathbf{x}, \mathbf{y}^+)\right]\).
      • \(Z_q = \mathbb{E}_q\left[k(\mathbf{x}, \mathbf{y}^-)\right]\).
  • Desc.)
    • \(p_{\text{data}}\) attracts the field.
    • \(q\) repulses the field.



Algorithm)




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • (DM Reconst.) Ch.2 Variational Perspective - From VAEs to DDPM
  • (DM Reconst.) Ch.3 Score-Based Perspective - From EBMs to NCSN
  • Denoising Diffusion Probabilistic Models (DDPM)
  • Flow Straight and Fast - Learning to Generate and Transfer Data with Rectified Flow (Rectified Flow)
  • Variational Autoencoder Bayes (VAE)