1
$\begingroup$

In the support vector machine model, we want to find a plane, $p=w^Tx+b$ that separates two sets of data (labeled positive and negative) with as large a margin as possible. Referencing the notes here, we see on page 7 the objective function:

$$\min_{w,b} \frac 1 2 w^Tw$$ s.t. $$y^{(i)}(w^Tx^{(i)}+b)\geq 1$$

Now, we can use geometric arguments to see that there must be atleast two points on either side of the separating plane for which the inequalities above are "tight" (i.e. they become equalities). I'm wondering - what if I was just given the optimization problem and had no idea about the geometric interpretation. Would it still be possible to somehow extract this fact? Feel free to use the dual form of the problem on page 12.

Here is how the geometric argument works: Imagine the separating plane has a point that is closest to it on the positive side at distance $d$. The point nearest to it on the negative side is at distance $d+\epsilon$. We can improve the objective function, which is the margin by moving the plane (without changing $w$, only $b$) so that it is $\frac{\epsilon}{2}$ close to the closest negative point.

$\endgroup$

1 Answer 1

0
$\begingroup$

I managed to figure this out, eventually. If you note the very first optimization problem of page 6 of the notes refereed to above, it reads:

$$\max_{\gamma, w, b} \gamma$$ s.t. $$y^{(i)}(w^Tx^{(i)}+b) \geq \gamma$$ $$w^Tw=1$$

Here, $\gamma$ is a dummy variable, $x^{(i)}$ is the $i$th data point in the feature space and $y^{(i)}$ is the label of that point ($+/- 1$).

If we form the Lagrangian (assuming Lagrange multipliers $\alpha_i$ for each constraint, we get:

$$L = -\gamma -\sum_i \alpha_i (y_i(w^Tx^{(i)}+b)-\gamma) -\beta(w^Tw-1)$$

Here, the $\alpha_i$'s are all $\geq 0$.

Now, taking derivative with respect to $\gamma$ we get:

\begin{equation}\sum \alpha_i = 1 \tag{1}\end{equation}

And taking derivative with respect to $b$:

$$\sum y^{(i)} \alpha_i = 0 \tag{2}$$

If we consider $\textit{I}$ to be the set of positive labels and $\textit{J}$ to be the set of negative labels, we can re-write the above equation:

$$\sum_{i \in \textit{I}} \alpha_i = \sum_{j \in \textit{J}} \alpha_j \tag{3}$$

Equations (1) and (3) along with the fact that all the $\alpha_i$'s are $\geq 0$ imply that there must be at least one non-zero $\alpha_i$ in each of the positive and negative classes. And those two constraints must be tight.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .