0
$\begingroup$

I've been reading the derivation for SVMs in the book by Chris Bishop (pattern recognition and machine learning). Equations (7.7) describes the Lagrangian. Note the $\frac{1}{2}$ in front of the $w$, which was chosen arbitrarily.

Then, the derivatives with respect to $w$ and $b$ are set to zero producing equations (7.8) and (7.9).

\begin{align} L(w,b,a) = \frac{1}{2} ||w||^2 - \sum_{n=1}^N a_n (t_n (w^T \phi(x_n)+b)-1) \tag{7.7}\end{align}

Separating the terms,

\begin{align}L(w,b,a) = \frac{1}{2}||w||^2 -\sum_{n=1}^N a_nt_nw^T\phi(x_n) +b\sum_{n=1}^N a_nt_n-\sum_{n=1}^N a_n\tag{7.7a}\end{align}

\begin{align} w = \sum_{n=1}^N a_n t_n \phi(x_n) \tag{7.8}\end{align}

\begin{align} 0 = \sum_{n=1}^N a_n t_n \tag{7.9}\end{align}

Then, he substitutes equation (7.8) into (7.7)

Note that as a direct consequence of (7.8) we get:

$$||w||^2 = w^Tw = \sum_{n=1}^N \sum_{m=1}^N a_n a_m t_n t_m \phi(x_n)^T \phi(x_m) = \sum_{n=1}^N a_nt_n w^T\phi(x_n)\tag{7.8a}$$

Substituting into (7.7a), the first two terms yield: $-\frac{1}{2}\sum_{n=1}^N \sum_{m=1}^N a_n a_m t_n t_m \phi(x_n)^T \phi(x_m)$ and this reduces the Lagrangian to:

$$L(a) = \sum a_n -\frac{1}{2}\sum_{n=1}^N \sum_{m=1}^N a_n a_m t_n t_m \phi(x_n)^T \phi(x_m)$$

Herein lies my question. The only reason we were left with $-\frac{1}{2}$ was due to the arbitrary $\frac{1}{2}$ chosen to accompany $w$. If we chose 1 instead, the term would completely cancel out, fundamentally changing the Lagrangian.

$\endgroup$

1 Answer 1

1
$\begingroup$

The answer occurred to me as I was writing the question. But since I had already put a lot of work into the question, I decided to leave it there and answer it for my own reference. The error in my thinking was assuming that (7.8) would remain unchanged if the multiplier accompanying $||w||^2$ (the objective function) was changed.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .