How to lower bound the absolute value of the difference of two Kullback-Leibler divergences given the constrains below?

Question

Given that $\min (D[p_1||p_3],D[p_2||p_4])=a$, how to find a lower bound of the difference $\left \vert D[p_1\parallel p_2]-D[p_3\parallel p_4] \right\vert$ with respect to $a$? If the condition is weakened to be $\max (D[p_1||p_3],D[p_2||p_4])=a$, is that still possible to find a lower bound of the quantity？

The original question is how to find upper bound, which is shown to be impossible.

Given that

$D[p_1\parallel p_3]\leq \epsilon$, $D[p_3\parallel p_1]\leq \epsilon$, $D[p_2\parallel p_4]\leq \epsilon$ and $D[p_4\parallel p_2]\leq \epsilon$,

is that possible to place an upper bound on the quantity $\left \vert D[p_1\parallel p_2]-D[p_3\parallel p_4] \right\vert$ using $\epsilon$ ? The KL divergence is defined as $D[p\parallel q]\equiv\sum_x p(x)\ln\frac{p(x)}{q(x)}$. Thank you very much.

The KL divergence does not satisfy the triangle inequality, which makes the problem not that easy. Otherwise, this can readily be proved by adding a $D[p_2||p_3]$ within the absolute value and then subtracting it. I try to use the information projection [https://en.wikipedia.org/wiki/Information_projection] to prove it, but have not found a way yet.

The KL divergence does not satisfy the triangle inequality, which makes the problem not that easy. Otherwise, this can readily be proved by adding a $D[p_2||p_3]$ within the absolute value and then subtracting it. I try to use the information projection [en.wikipedia.org/wiki/Information_projection] to prove it, but have not found a way yet. — Richard Ben, Commented Apr 18 at 15:06
I changed $D[p||q],$ coded as D[p||q] to $D[p\parallel q],$ coded as D[p\parallel q]. The former notation doesn't have the horizontal spacing that binary operation and binary relation symbols normally have. — Michael Hardy, Commented Apr 18 at 17:04

Iosif Pinelis · Accepted Answer · 2024-04-18 21:42:02Z

4

$\newcommand\p{\parallel}$The answer is no.

E.g., suppose that $p_1=p_3=(1/2,1/2)$, $p_2=(s,1-s)$, and $p_4=(s^2,1-s^2)$, where $s\downarrow0$.

Then $D[p_1\p p_3],D[p_3\p p_1],D[p_2\p p_4],D[p_4\p p_2]$ go to $0$, whereas $D[p_1\p p_2]-D[p_3\p p_4]\to-\infty$.

The intuition behind this example is as follows: $D[p\p q]$ is your long-time average surprise if $p$ is the true probability distribution and $q$ is what you think the true probability distribution is. So, if $p$ assigns a low probability (say $s$) to an outcome $o$ and $q$ also assigns a low probability (say $t$) to the outcome $o$ (and both $p$ and $q$ assign very similar, not small probabilities to the outcomes other than $o$), then your long-time average surprise $D[p\p q]$ will be small even if (say) $t$ is $s^2$ and thus much less than $s$. However, if $p$ assigns (say) the same, not low probability to each outcome whereas $q$ assigns a low probability $t$ to an outcome $o$, then your long-time average surprise $D[p\p q]$ will be quite sensitive to how small $t$ is: $t=10^{-100}$ will be perceived quite differently from $t=10^{-2}$.

edited Apr 18 at 21:42

answered Apr 18 at 20:53

Iosif Pinelis

121k8 gold badges101 silver badges211 bronze badges

$\begingroup$ Thank you very for the useful counterexample. This shows that even an upper bound of the order of $\epsilon \max (D[p_1||p_2], D[p_3||p_4])$ does not exist. $\endgroup$
– Richard Ben
Commented Apr 20 at 2:55
$\begingroup$ Additionally, could you comment on the lower bound? If $\min (D[p_1||p_3], D[p_2||p_4])=\epsilon$ (or $\max (D[p_1||p_3], D[p_2||p_4])=\epsilon$), can one find a lower bound on the difference $|D[p_1||p_2]-D[p_3||p_4]|$ with respect to $\epsilon$? $\endgroup$
– Richard Ben
Commented Apr 20 at 3:06
$\begingroup$ If $p_1 = p_2$ and $p_3 = p_4$ then clearly $|D[p_1 \parallel p_2] - D[p_3 \parallel p_4]| = 0$ but $D[p_1 \parallel p_3] = D[p_2 \parallel p_4]$ can be anything. $\endgroup$
– Jarosław Błasiok
Commented Apr 20 at 12:06
$\begingroup$ @RichardBen : I am glad that you found the answer useful. It is better to ask additional questions in separate posts. If you are satisfied with this answer, then these guidelines may now be relevant. $\endgroup$
– Iosif Pinelis
Commented Apr 21 at 1:57

Add a comment |

Stack Exchange Network

How to lower bound the absolute value of the difference of two Kullback-Leibler divergences given the constrains below?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
pr.probability
inequalities
it.information-theory
machine-learning
information-geometry
or ask your own question.

How to lower bound the absolute value of the difference of two Kullback-Leibler divergences given the constrains below?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pr.probabilityinequalitiesit.information-theorymachine-learninginformation-geometry or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
pr.probability
inequalities
it.information-theory
machine-learning
information-geometry
or ask your own question.