0
$\begingroup$

I'm currently trying to wrap my head around the training loss functions for DPMs and how they vary from DDPMs, however there are differences in how the papers describe the processes, making it difficult to understand.

In "Deep Unsupervised Learning using Nonequilibrium Thermodynamics," the seminal paper for diffusion models states that "Training amounts to maximizing the model log likelihood," (or, maximizing the lower bound on the model log likelihood) which ultimately leads to equation 14, which is given below:

\begin{split} K = - &\sum_{t=2}^{T} \int d\mathbf{x}_0 d\mathbf{x}_t q \left( \mathbf{x}_0, \mathbf{x}_t \right) \cdot \\ & D_{KL} \left( q \left( \mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0 \right) || p \left( \mathbf{x}_{t-1} | \mathbf{x}_t \right) \right) \\ & + H_q \left( \mathbf{X}_T | \mathbf{X}_0 \right) - H_q \left( \mathbf{X}_1 | \mathbf{X}_0 \right) - H_p \left( \mathbf{X}_T \right). \end{split}

I understand what is happening in this equation without too much issue.

However, when looking at other studies trying to build off of this, the equations given are quite different. Particularly, the paper "Denoising Diffusion Probabilistic Models" aims to minimize the upper bound on the negative log likelihood (an inversion of the previous equation), however the paper contains the following equation instead:

\begin{equation} L_{VLB}= \mathbb{E}_q [\underbrace{D_\text{KL}(q(\mathbf{x}_T \vert \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_T))}_{L_T} + \sum_{t=2}^T \underbrace{D_\text{KL}(q(\mathbf{x}_{t-1} \vert \mathbf{x}_t, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_{t-1} \vert\mathbf{x}_t))}_{L_{t-1}} \underbrace{- \log p_\theta(\mathbf{x}_0 \vert \mathbf{x}_1)}_{L_0} ] \end{equation}

I understand that the first equation and this equation are likely representing an extremely similar process (The second equation is also related to the log likelihood, after all), however I don't understand why these representations are so different, and how to interpret the second equation.

Compounding my confusion, appendix A in "Denoising Diffusion Probabilistic Models" gives a working through of a derivation of the L_{VLB} equation they gave, attributing the process to the "Deep Unsupervised Learning using Nonequilibrium Thermodynamics" paper, however I was not able to find it.

Which of these equations is the "loss function" for the diffusion models? Are these the different representations of the same equation? If not, what are they and why are they important? If they are, how can I understand equation 2?

$\endgroup$

0

You must log in to answer this question.