1
$\begingroup$

I have a question in about a property of support vectors of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

enter image description here

The Lagrange (primal) function is:

enter image description here

Setting the respective derivatives to zero, we get:

enter image description here

The Karush–Kuhn–Tucker conditions include the constraints:

enter image description here

From (12.10) we see that the solution for $\beta$ has the form

enter image description here

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $\hat\alpha_i < C$ is deduced? Really with attention to the $\alpha_i + \mu_i = C$ we can say $\mu_i > 0$ is necessary to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

$\endgroup$
5
  • $\begingroup$ Since one of the constraints is $\mu_i \xi_i=0$, if we are on the edge of the margin $\xi_i=0$ and so $\mu_i >0$. Since $\alpha_i = C - \mu_i$, $\alpha_i < C$. From the positivity constraints $\alpha_i, \mu_i, \xi_i \geq 0 \; \forall i$, since in this case $\alpha_i \neq 0$, we get $\alpha_i > 0$. $\endgroup$
    – Ted Black
    Commented Jun 25 at 17:17
  • $\begingroup$ @TedBlack Thank you for your reply. But I said I think both $\mu_i$ and $ξ_i$ may be zero at the same time. Are not you agree with me? $\endgroup$ Commented Jun 26 at 18:22
  • $\begingroup$ $\mu_i$ is a slack variable which hopefully has a positive value that forces $\xi_i$ to be optimised (to minimize the objective we would like $\xi_i \geq 0$). $\endgroup$
    – Ted Black
    Commented Jun 26 at 19:26
  • $\begingroup$ @TedBlack So, I think that you are agree that $\hat\alpha_i < C$ is not a deduction. Although as I know $ξ_i$ is the slack variable and $\mu_i$ is one of Lagrangian multipliers. $\endgroup$ Commented Jun 26 at 19:39
  • $\begingroup$ $\hat\alpha_i <C$ is a deduction from (12.15) and (12.12). $\mu_i$ is a slack variable; you can see that since none of the derivatives (12.10), (12.11) and (12.12) are partial derivatives w.r.t. $\mu_i$. $\endgroup$
    – Ted Black
    Commented Jun 26 at 19:42

0

You must log in to answer this question.