0
$\begingroup$

When calculating the partial derivative: $$\frac{\partial}{\partial\theta_{j}}J(\theta) $$ from: $$ J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}(y^{i}\log(h_\theta(x^{i}))+(1-y^{i})\log(1-h_\theta(x^{i})))$$ with: $$h_{\theta}(x)=g(\theta^{T}x)$$ and $$g(z)=\frac{1}{1+e^{-z}}$$ as documented in the question here: Derivative of cost function for logistic regression (Note: Some of my content is taken from that post.)

So, how are we deriving:

$$\frac{\partial}{\partial\theta_{j}}J(\theta) =\sum_{i=1}^{m}(h_\theta(x^{i})-y^i)x_j^i$$ rather than: $$\frac{\partial}{\partial\theta_{j}}J(\theta) =\sum_{i=1}^{m}(y^i - h_\theta(x^{i}))x_j^i$$ ? When I work it out, I get the latter, rather than the former.

Here's my work (attempted proof):

Let $$a = \log(h_\theta(x^i))$$ and $$ b=\log(1-h_\theta(x^i))$$ Then,

$$ J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}(y^{i} a+(1-y^{i}) b)$$

We can then simplify a: $$a = \log(h_\theta(x^i)) $$ $$= \log(\frac{1}{1+e^{-\theta x^i}})$$ $$= \log(1) - \log(1+e^{-\theta x^i}) $$ $$= 0 - \log(1+e^{-\theta x^i}) $$ $$= -\log(1+e^{-\theta x^i})$$

and since $$1 = \frac{(1+e^{-\theta x^i})}{(1+e^{-\theta x^i})}$$

we can simplify b:

$$b = \log(1-h_\theta(x^i)) $$ $$= \log(1 - \frac{1}{(1+e^{-\theta x^i})}) $$ $$= \log(\frac{(1+e^{-\theta x^i})}{(1+e^{-\theta x^i})} - \frac{1}{(1+e^{-\theta x^i})}) $$ $$= \log(\frac{(1+e^{-\theta x^i}) - 1}{(1+e^{-\theta x^i})}) $$ $$= \log(\frac{1-1+e^{-\theta x^i}}{(1+e^{-\theta x^i})})$$ $$= \log(\frac{0+e^{-\theta x^i}}{(1+e^{-\theta x^i})})$$ $$= \log(\frac{e^{-\theta x^i}}{(1+e^{-\theta x^i})}) $$ $$= \log(e^{-\theta x^i}) - \log(1+e^{-\theta x^i})$$ $$= -\theta x^i - \log(1 + e^{-\theta x^i}) $$

by associativity, simplifying by common denominators, evaluating the logarithm on e, and the logarithm property that $$\log_b(M/N) = \log_b(M) - \log_b(N)$$

Thus, we simplified to: $$a = -\log(1+e^{-\theta x^i})$$ and $$b = -\theta x^i - \log(1 + e^{-\theta x^i})$$

Let us then set: $$t = \log(1 + e^{-\theta x^i})$$ so that: $$a = -t$$ and $$b = -\theta x^i - t$$

We may then substitute: $$ J(\theta) =-\frac{1}{m}\sum_{i=1}^{m}(y^{i}\log(h_\theta(x^{i}))+(1-y^{i})\log(1-h_\theta(x^{i}))) $$ $$= -\frac{1}{m}\sum_{i=1}^{m}(y^i a + (1 - y^i) b)) $$ Expanding to t, we then get: $$J(\theta) =-\frac{1}{m}\sum_{i=1}^{m}(y^i (-t) + (1 - y^i) (-\theta x^i - t)) = -\frac{1}{m}\sum_{i=1}^{m}(-t y^i + (1 - y^i) (-\theta x^i - t)) $$ This allows us to expand via FOIL since: $$ (1 - y^i) (-\theta x^i - t) = (-\theta x^i + \theta x^iy^i - t + ty^i)$$

So, $$J(\theta) = -\frac{1}{m}\sum_{i=1}^{m}(-ty^i + (-\theta x^i + \theta x^iy^i -t + ty^i))$$ and again by associativity: $$J(\theta) = -\frac{1}{m}\sum_{i=1}^{m}(-ty^i + ty^i + -\theta x^i + \theta x^iy^i -t )$$ $$ = -\frac{1}{m}\sum_{i=1}^{m}(0 + -\theta x^i + \theta x^iy^i -t )$$ $$ = -\frac{1}{m}\sum_{i=1}^{m}(-\theta x^i + \theta x^iy^i -t )$$

Now we can expand t back out to reduce further because

$$ -\theta x^i = \log(e^{-\theta x^i})$$ and $$ -\theta x^i = -\log(e^{\theta x^i})$$

So, $$J(\theta) = -\frac{1}{m}\sum_{i=1}^{m}(-\theta x^i + \theta x^iy^i - \log(1 + e^{-\theta x^i}) )$$

$$ = -\frac{1}{m}\sum_{i=1}^{m}(-\log(e^{\theta x^i}) + \theta x^iy^i - \log(1 + e^{-\theta x^i}))$$

$$ = -\frac{1}{m}\sum_{i=1}^{m}(\theta x^iy^i + (-\log(e^{\theta x^i})) - \log(1 + e^{-\theta x^i}))$$

$$ = -\frac{1}{m}\sum_{i=1}^{m}(\theta x^iy^i + (-[\log(e^{\theta x^i}) + \log(1 + e^{-\theta x^i})]))$$

and since: $$ \log(M) + \log(N) = \log(MN)$$

we can apply the distributive property and simplify the bracketed term: $$ [\log(e^{\theta x^i}) + \log(1 + e^{-\theta x^i})] $$ $$= \log(e^{\theta x^i} * (1 + e^{-\theta x^i}) $$ $$= \log(e^{\theta x^i} + (e^{\theta x^i} * e^{-\theta x^i} )) $$ $$= \log(e^{\theta x^i} + (e^{(\theta x^i) + (-\theta x^i)} ))$$ $$= \log(e^{\theta x^i} + (e^{0} ))$$ $$= \log(e^{\theta x^i} + (1) )$$ $$= \log(e^{\theta x^i} + 1 )$$

Substituting the simplified bracketed term back in: $$J(\theta) = -\frac{1}{m}\sum_{i=1}^{m}(\theta x^iy^i + (-[\log(e^{\theta x^i} + 1 )]))$$

and dropping the extra brackets and parentheses:

$$J(\theta) = -\frac{1}{m}\sum_{i=1}^{m}(\theta x^iy^i -\log(e^{\theta x^i} + 1 ))$$

Next, calculating the partial derivatives:

$$\frac{\partial}{\partial \theta_j} (\theta x^i y^i)= x^i_j y^i $$ because the corresponding theta component becomes 1 and drops out. Then, let:

$$ s = \theta x^i$$ $$ r = 1 + e^{s} $$

$$\frac{\partial}{\partial \theta_j} (\log(r)) = \frac{\partial}{\partial r} \frac{\partial r}{\partial s} \frac{\partial s}{\partial \theta_j} (\log(r))$$ So, $$\frac{\partial}{\partial r}(\log(r)) = \frac{1}{r} $$ (since we're really referring to the natural log, ln) and $$ \frac{\partial r}{\partial s} = e^s$$ and $$ \frac{\partial s}{\partial \theta_j} = x_j^i $$ because the theta component again becomes 1 and drops out. So, by the chain rule and substitution: $$\frac{\partial}{\partial \theta_j}(\log(r)) = \frac{\partial}{\partial r} \frac{\partial r}{\partial s} \frac{\partial s}{\partial \theta_j}$$ $$= \frac{1}{r} e^s x_j^i $$ $$= \frac{1}{1 + e^{s}} e^s x_j^i $$ $$= \frac{1}{1 + e^{\theta x^i}} e^{\theta x^i} x_j^i$$ $$= \frac{e^{\theta x^i} x_j^i}{1 + e^{\theta x^i}}$$ We can then move the e term from the top to the bottom and apply the distributive property again to simplify: $$\frac{\partial}{\partial \theta_j} (\log(r)) = \frac{x_j^i}{e^{-\theta x^i} (1 + e^{\theta x^i})}$$ $$= \frac{x_j^i}{(e^{-\theta x^i} + e^{-\theta x^i} * e^{\theta x^i})}$$ and then using exponent multiplication rules: $$ \frac{\partial}{\partial \theta_j} (\log(r)) = \frac{x_j^i}{(e^{-\theta x^i} + e^{(-\theta x^i) + (\theta x^i)})}$$ $$= \frac{x_j^i}{(e^{-\theta x^i} + e^{(0)})}$$ $$= \frac{x_j^i}{(e^{-\theta x^i} + 1)}$$ $$= \frac{x_j^i}{(1 + e^{-\theta x^i})}$$

By using the notation from the link above: $$ \theta x^i:=\theta_0+\theta_1 x^i_1+\dots+\theta_p x^i_p. $$ Resolving: $$ h_\theta(x^i) = g(\theta^T x^i) = \frac{1}{1 + e^{-\theta x^i}} $$ and then factoring out:

$$ \frac{\partial}{\partial \theta_j}(\log(r)) = \frac{x_j^i}{(1 + e^{-\theta x^i})} = x_j^i \frac{1}{(1 + e^{-\theta x^i})} = x_j^i h_\theta(x^i)$$

So, our inner partial derivatives are:

$$ \frac{\partial}{\partial \theta_j} (\theta x^i y^i)= x^i_j y^i$$ and $$ \frac{\partial}{\partial \theta_j}(\log(e^{\theta x^i} + 1)) = x_j^i h_\theta(x^i)$$

So, for J: $$ J(\theta) = -\frac{1}{m}\sum_{i=1}^{m}(\theta x^iy^i -\log(e^{\theta x^i} + 1 ))$$ computing the partial derivative:

$$ \frac{\partial}{\partial \theta_j}(J(\theta)) = -\frac{1}{m}\sum_{i=1}^{m}(\frac{\partial}{\partial \theta_j} (\theta x^i y^i) - \frac{\partial}{\partial \theta_j}(\log(e^{\theta x^i} + 1))) $$ $$ = -\frac{1}{m}\sum_{i=1}^{m}(x^i_j y^i - x_j^i h_\theta(x^i))$$ which we can factor: $$ = -\frac{1}{m}\sum_{i=1}^{m}((y^i - h_\theta(x^i))x^i_j)$$

and yet the course instructor obtained: $$ \frac{\partial}{\partial\theta_{j}}J(\theta) =\sum_{i=1}^{m}(h_\theta(x^{i})-y^i)x_j^i $$

How is this possible?

$\endgroup$
2
  • 1
    $\begingroup$ In your final result bring the minus sign inside the parentheses? $\endgroup$
    – xidgel
    Commented Mar 23, 2017 at 21:13
  • $\begingroup$ @xidgel Oh my gosh.... I can't believe I didn't see that. Thank you!!! $\endgroup$
    – devinbost
    Commented Mar 23, 2017 at 21:18

1 Answer 1

0
$\begingroup$

So, the answer is that the course instructor brought the minus sign inside the parentheses. Thanks to @xidgel for noticing it! So, $$-\frac{1}{m}\sum_{i=1}^{m}((y^i - h_\theta(x^i))x^i_j)$$

$$= \frac{1}{m}\sum_{i=1}^{m}(-(y^i - h_\theta(x^i))x^i_j)$$ $$= \frac{1}{m}\sum_{i=1}^{m}((-y^i + h_\theta(x^i))x^i_j)$$ $$= \frac{1}{m}\sum_{i=1}^{m}((h_\theta(x^i)-y^i)x^i_j)$$

Also, when I took a closer look at the instructor's derivative, contrary to what was posted in Derivative of cost function for Logistic Regression, the instructor's term did still have the $$\frac{1}{m}$$ in front of the sum. But it was positive, not negative.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .