Loss Equation for Training DPMs vs DDPMs

Ask Question

Asked 8 months ago

Modified 7 months ago

Viewed 48 times

I'm currently trying to wrap my head around the training loss functions for DPMs and how they vary from DDPMs, however there are differences in how the papers describe the processes, making it difficult to understand.

In "Deep Unsupervised Learning using Nonequilibrium Thermodynamics," the seminal paper for diffusion models states that "Training amounts to maximizing the model log likelihood," (or, maximizing the lower bound on the model log likelihood) which ultimately leads to equation 14, which is given below:

\begin{split} K = - &\sum_{t=2}^{T} \int d\mathbf{x}_0 d\mathbf{x}_t q \left( \mathbf{x}_0, \mathbf{x}_t \right) \cdot \\ & D_{KL} \left( q \left( \mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0 \right) || p \left( \mathbf{x}_{t-1} | \mathbf{x}_t \right) \right) \\ & + H_q \left( \mathbf{X}_T | \mathbf{X}_0 \right) - H_q \left( \mathbf{X}_1 | \mathbf{X}_0 \right) - H_p \left( \mathbf{X}_T \right). \end{split}

I understand what is happening in this equation without too much issue.

However, when looking at other studies trying to build off of this, the equations given are quite different. Particularly, the paper "Denoising Diffusion Probabilistic Models" aims to minimize the upper bound on the negative log likelihood (an inversion of the previous equation), however the paper contains the following equation instead:

\begin{equation} L_{VLB}= \mathbb{E}_q [\underbrace{D_\text{KL}(q(\mathbf{x}_T \vert \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_T))}_{L_T} + \sum_{t=2}^T \underbrace{D_\text{KL}(q(\mathbf{x}_{t-1} \vert \mathbf{x}_t, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_{t-1} \vert\mathbf{x}_t))}_{L_{t-1}} \underbrace{- \log p_\theta(\mathbf{x}_0 \vert \mathbf{x}_1)}_{L_0} ] \end{equation}

I understand that the first equation and this equation are likely representing an extremely similar process (The second equation is also related to the log likelihood, after all), however I don't understand why these representations are so different, and how to interpret the second equation.

Compounding my confusion, appendix A in "Denoising Diffusion Probabilistic Models" gives a working through of a derivation of the L_{VLB} equation they gave, attributing the process to the "Deep Unsupervised Learning using Nonequilibrium Thermodynamics" paper, however I was not able to find it.

Which of these equations is the "loss function" for the diffusion models? Are these the different representations of the same equation? If not, what are they and why are they important? If they are, how can I understand equation 2?

edited Nov 26, 2023 at 14:20

asked Nov 23, 2023 at 17:33

Tomas Premoli Muniagurria

557 bronze badges

Add a comment |

Stack Exchange Network

Loss Equation for Training DPMs vs DDPMs

0

You must log in to answer this question.

Browse other questions tagged
probability
statistics
markov-chains
machine-learning
log-likelihood
.

Hot Network Questions

Loss Equation for Training DPMs vs DDPMs

0

You must log in to answer this question.

Browse other questions tagged probabilitystatisticsmarkov-chainsmachine-learninglog-likelihood.

Related

Hot Network Questions

Browse other questions tagged
probability
statistics
markov-chains
machine-learning
log-likelihood
.