13
$\begingroup$

This question discusses two equivalent ways to express the canonical loss function for a logistic regression, depending on if you code the categories as $\{0,1\}$ or $\{-1,+1\}$. In the following, let $x_i$ be the $i$th feature vector, $w$ be the parameter vector for the logistic regression, $N$ be the sample size, and $p(y_i)$ be the predicted probability of membership to category $1$.

$$ \text{Logistic Loss}\\ \dfrac{1}{N}\overset{N}{\underset{i=1}{\sum}} \log\left(1 + \exp(-y_i w^Tx_i)\right)\\ y_i\in\{-1,+1\} $$

$$ \text{Log Loss}\\ -\dfrac{1}{N}\overset{N}{\underset{i=1}{\sum}}\left[ y_i \log(p(y_i)) + (1 - y_i)\log(1 - p(y_i)) \right]\\ y_i\in\{0, 1\} $$

What is the algebra showing these two formulations to be equivalent? Not even the proposed duplicate to the first link really shows why the two must give the same loss value, and while both this and this are close, neither quite explicitly shows that $\text{Logistic Loss} = \text{Log Loss}$. I would like to see a chain of equal expressions like $\text{Logistic Loss} =\dots = \text{Log Loss}$.

$\endgroup$
0

3 Answers 3

16
$\begingroup$

Consider the case when $y_i = -1$ in the logistic loss and $y_i = 0$ in the log loss. The summand in the logistic loss becomes $$\log\left(1 + \exp(w^Tx_i)\right)$$ and the summand in the log loss becomes $$-\log(1 - p(y_i = 0))$$

Using the following equivalence given in your answer here $$ p(y_i) = \dfrac{1}{ 1 + \exp(-w^Tx_i) }\\ \Big\Updownarrow\\ w^Tx_i = \log\left( \dfrac{ p(y_i) }{ 1 - p(y_i) } \right) $$ We can re-write the summand in the logistic loss as \begin{align} \log\left(1 + \exp(w^Tx_i)\right) &= \log\left(1 + \exp\left(\log\left( \dfrac{ p(y_i=-1) }{ 1 - p(y_i=-1) } \right)\right)\right) \\ &= \log\left(1+ \frac{p(y_i=-1)}{1-p(y_i=-1)}\right) \\ &= \log\left(\frac{1-p(y_i=-1)}{1-p(y_i=-1)} + \frac{p(y_i=-1)}{1-p(y_i=-1)}\right) \\ &= \log\left(\frac{1}{1-p(y_i=-1)}\right) \\ &= -\log\left(1-p(y_i=-1)\right) \\ \end{align} Assuming that $p(y_i = -1)$ for the logistic loss is equivalent to $p(y_i = 0)$ for the log loss, the summand for the logistic loss (when $y_i = -1$) is equivalent to the summand for the log loss (when $y_i = 0$). The case when $y_i = 1$ in the logistic loss and $y_i = 1$ in the log loss can be shown in a similar way.

$\endgroup$
7
+50
$\begingroup$

Given $y_i\in\{-1,+1\}$, $z_i\in\{0,1\}$ and $z_i = (y_i+1)/2 \iff y_i = 2z_i-1$.

Also, $$p(y_i\equiv1) = p(z_i\equiv1)= \left(1+\exp(-w^Tx_i)\right)^{-1}\\ p(y_i\equiv-1) = p(z_i\equiv0)= \left(1 + \exp(w^Tx_i)\right)^{-1}$$

Then $$ \begin{align} \color{red}{\log\left(1 + \exp(-y_i w^Tx_i)\right)}&=\\ \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i-1}{2}\right)\log\left(1 + \exp(w^Tx_i)\right)&=\\ \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i+(1-2)}{2}\right)\log\left(1 + \exp(w^Tx_i)\right)&=\\ \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i+1}{2}-1\right)\log\left(1 + \exp(w^Tx_i)\right)&=\\ z_i\log\left(1/p(z_i\equiv1)\right)- (z_i-1)\log\left(1/p(z_i\equiv0)\right)&=\\ -z_i\log\left(p(z_i\equiv1)\right)+ (z_i-1)\log\left(p(z_i\equiv 0\right)&=\\ -z_i\log\left(p(z_i\equiv1)\right)+ (z_i-1)\log\left(1-p(z_i\equiv1)\right)&=\\ \color{blue}{-\left(z_i\log\left(p(z_i\equiv1)\right)+ (1-z_i)\log\left(1-p(z_i\equiv1)\right)\right)}& \end{align} $$


A similar proof can be obtained by reverting to the Bernoulli likelihood.

$\endgroup$
12
  • 1
    $\begingroup$ My edit was incorrect? Then why does the minus sign disappear? $\endgroup$
    – Dave
    Commented Mar 17, 2023 at 16:40
  • 1
    $\begingroup$ Thanks for the edit @Dave but I had a correct minus sign. The minus disappears because the second term is only activated when $y_i \equiv -1$ $\endgroup$
    – Firebug
    Commented Mar 17, 2023 at 16:43
  • 1
    $\begingroup$ I added a line and slightly modified another in your derivation for clarity. Hope that's OK. $\endgroup$
    – mhdadk
    Commented Mar 19, 2023 at 14:07
  • 1
    $\begingroup$ Sure @mhdadk, if that improves clarity for others :) $\endgroup$
    – Firebug
    Commented Mar 19, 2023 at 14:08
  • 1
    $\begingroup$ If it is okay with you, I would like to award a bounty to this answer and then expand on this with a self-answer that I will accept. $\endgroup$
    – Dave
    Commented Apr 8, 2023 at 23:22
1
$\begingroup$

Let's start by defining some notation.

$$ y_i\in\{-1,+1\}\\ z_i\in\{0,1\} $$

Then $z_i = (y_i+1)/2 \iff 2z_i = y_i + 1 \iff y_i = 2z_i-1$.

Also, $p(y_i\equiv1) = p(z_i\equiv1)$ and $(y_i\equiv-1) = p(z_i\equiv0)$.

Also: $$p(y_i\equiv1) = p(z_i\equiv1)= \left(1+\exp(-w^Tx_i)\right)^{-1}\\ p(y_i\equiv-1) = p(z_i\equiv0)= \left(1 + \exp(w^Tx_i)\right)^{-1}$$

When $y_i = -1$, then $\dfrac{y_i + 1}{2} = 0$ and $\dfrac{y_i - 1}{2} = -1$. When $y_1 = +1$, then $\dfrac{y_i + 1}{2} = 1$ and $\dfrac{y_i - 1}{2} = 0$. Consequently:

$$ \color{red}{\log\left(1 + \exp(-y_i w^Tx_i)\right)}\\ =\left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i-1}{2}\right)\log\left(1 + \exp(w^Tx_i)\right) $$

Then $1-2 = -1$, so we get:

$$ \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i-1}{2}\right)\log\left(1 + \exp(w^Tx_i)\right)\\= \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i+(1-2)}{2}\right)\log\left(1 + \exp(w^Tx_i)\right) $$

For the fraction on the right, $\dfrac{y_i+(1-2)}{2} = \dfrac{y_i + 1}{2} - 1$, so:

$$ \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i+(1-2)}{2}\right)\log\left(1 + \exp(w^Tx_i)\right)\\= \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i+1}{2}-1\right)\log\left(1 + \exp(w^Tx_i)\right) $$

Since $z_i = (y_i+1)/2$, $p(y_i\equiv1) = \left(1+\exp(-w^Tx_i)\right)^{-1}$, and $p(y_i\equiv-1) = \left(1 + \exp(w^Tx_i)\right)^{-1}$:

$$ \left(\frac{y_i+1}{2}\right)\log\left(1 + \exp(-w^Tx_i)\right)- \left(\frac{y_i+1}{2}-1\right)\log\left(1 + \exp(w^Tx_i)\right)\\= z_i\log\left(1/p(z_i\equiv1)\right)- (z_i-1)\log\left(1/p(z_i\equiv0)\right) $$

Next, a logarithm rule is that $\log(1/x) = -\log(x)$ for $x>0$.

$$ z_i\log\left(1/p(z_i\equiv1)\right)- (z_i-1)\log\left(1/p(z_i\equiv0)\right)\\= -z_i\log\left(p(z_i\equiv1)\right)+ (z_i-1)\log\left(p(z_i\equiv 0\right) $$

Next, $p(z_i \equiv 0) = 1 - p(z_i \equiv 1)$, so:

$$ -z_i\log\left(p(z_i\equiv1)\right)+ (z_i-1)\log\left(p(z_i\equiv 0\right)\\= -z_i\log\left(p(z_i\equiv1)\right)+ (z_i-1)\log\left(1-p(z_i\equiv1)\right) $$

Next, factor out the minus sign.

$$ -z_i\log\left(p(z_i\equiv1)\right)+ (z_i-1)\log\left(1-p(z_i\equiv1)\right)\\= {-\left(z_i\log\left(p(z_i\equiv1)\right)- (z_i-1)\log\left(1-p(z_i\equiv1)\right)\right)} $$

Finally, distribute the minus sign across the $z_i - 1$ on the right.

$$ {-\left(z_i\log\left(p(z_i\equiv1)\right)- (z_i-1)\log\left(1-p(z_i\equiv1)\right)\right)}\\= \color{blue}{-\left(z_i\log\left(p(z_i\equiv1)\right)+ (1-z_i)\log\left(1-p(z_i\equiv1)\right)\right)} $$

With each summand equal in the logistic and log loss functions defined in the question, the two loss functions are equal.

$\endgroup$
1
  • $\begingroup$ The point of this answer is to fill in the details of the steps taken in Firebug's answer (which will be getting the bounty once the grace period starts), so if there are details missing to justify each step, please let me know! $\endgroup$
    – Dave
    Commented Apr 11, 2023 at 12:19

Not the answer you're looking for? Browse other questions tagged or ask your own question.