
In Pattern Recognition and Machine Learning Ch 1.6, the author derives the distribution which maximises the differential entropy;

$$H(\textbf{x})-\int p(\textbf{x}) \ln (p(\textbf{x})) d\textbf{x}$$

To do so the author comes up with three constraints;

$$\int_{-\infty}^{\infty} p(x) dx = 1$$ $$\int_{-\infty}^{\infty} xp(x) dx = \mu$$ $$\int_{-\infty}^{\infty} (x-\mu)^2p(x) dx = \sigma^2$$

This results in the Lagrangian functional;

$$F(p)=-\int_{-\infty}^{\infty} p(x) \ln(p(x)) dx + \lambda_1(\int_{-\infty}^{\infty} p(x) dx - 1) + \lambda_2 (\int_{-\infty}^{\infty} x p(x) dx - \mu) + \lambda_3(\int_{-\infty}^{\infty} (x-\mu)^2 p(x) dx - \sigma^2)$$

Taking the derivative of this functional using the calculus of variations and setting it equal to zero gives;

$$p(x)=\exp(-1+\lambda_1+\lambda_2 x + \lambda_3 (x-\mu)^2)$$

The author states that you can find the Lagrange multipliers by back substitution of this result into the three constraint equations, leading to the conclusion that $p(x)$ is a normal density.

I'm wondering how to derive this last step, specifically how to find the Lagrange multipliers. If we substitute back into the constraints we get three integral equations with three unknowns. How would I go about solving these equations?


Assume that $\mu=0$ and $\sigma=1$, and let $z:=\sqrt{\pi}e^{-1+\lambda_1}e^{-\lambda_2^2/(4\lambda_3)}$. Then, assuming that $\lambda_3<0$, the equations are $$ I_1:=e^{-1+\lambda_1}\int_{-\infty}^{\infty} e^{\lambda_2x+\lambda_3x^2}\,dx=\frac{z}{(-\lambda_3)^{1/2}}=1, $$ $$ I_2:=e^{-1+\lambda_1}\int_{-\infty}^{\infty} xe^{\lambda_2x+\lambda_3x^2}\,dx=\frac{z\lambda_2}{2(-\lambda_3)^{3/2}}=0, \quad\text{and} $$ $$ I_3:=e^{-1+\lambda_1}\int_{-\infty}^{\infty} x^2e^{\lambda_2x+\lambda_3x^2}\,dx=\frac{z\lambda_2^2}{4(-\lambda_3)^{5/2}}+\frac{z}{2(-\lambda_3)^{3/2}}=1. $$ Plugging $z=(-\lambda_3)^{1/2}$, we get $$ \frac{\lambda_2}{-\lambda_3}=0\quad\text{and}\quad \frac{\lambda_2^2}{4\lambda_3^2}+\frac{1}{-2\lambda_3}=1, $$ so that $\lambda_2=0$ and $\lambda_3=-1/2$. Finally, using $z=(-\lambda_3)^{1/2}$, we get $\lambda_1=1-\ln \sqrt{2\pi}$.

Therefore, $$ p(x)=e^{-\ln \sqrt{2\pi}-x^2/2}=\frac{1}{\sqrt{2\pi}}e^{-x^2/2}. $$

For the general case, consider $y=(x-\mu)/\sigma$ and notice that $$ -\int p(x)\ln(p(x))\,dx=-\frac{1}{\sigma}\int p(y)\ln(p(y))\, dy. $$

Evaluation of $I_1$, $I_2$, and $I_3$:

First, recall that for $c>0$, $$ \int_{-\infty}^\infty e^{-cx^2}\,dx=\sqrt{\frac{\pi}{c}}, $$ and notice that $$ bx-cx^2=-c\left(\frac{b}{2c}-x\right)^2+\frac{b^2}{4c}. $$ Thus, letting $\lambda_1=a$, $\lambda_2=b$, and $\lambda_3=-c$, $$ I_1=e^{-1+a}e^{b^2/(4c)}\int_{-\infty}^\infty e^{-c(b/(2c)-x)^2}\,dx=e^{-1+a}e^{b^2/(4c)}\times \sqrt{\frac{\pi}{c}}, $$ As for the second integral, notice that $$ \int_{-\infty}^\infty \left(x-\frac{b}{2c}\right)e^{-c(b/(2c)-x)^2}=0, $$ and so $I_2=I_1b/(2c)$. Finally, $$ \frac{d}{dc}\int e^{-c(b/(2c)-x)^2}\,dx =\int \left(\frac{b^2}{4c^2}-x^2\right)e^{-c(b/(2c)-x)^2}\,dx. $$ Therefore, $$ I_3=I_1\frac{b^2}{4c^2}-e^{-1+a}e^{b^2/(4c)}\times\frac{d}{dc}\sqrt{\frac{\pi}{c}}. $$

