How does the SMO algorithm for SVMs guarantee the KKT condition "primal feasibility"?

Question

I'm reading Andrew Ng's lecture notes on support vector machines (SVMs) and the sequential minimal optimization (SMO) algorithm (see here). The notes first introduce the Karush-Kuhn-Tucker (KKT) conditions in their general form:

KKT conditions, equations (3), (4), (5), (6) and (7) (sorry, StackExchange doesn't let me embed pictures until I have earned 10 reputation points).

Later, they introduce the following KKT conditions, which are specific for the dual problem of SVMs in the non-separable case:

KKT conditions for dual of SVMs, equations (14), (15) and (16)

These conditions are used for testing the convergence of the SMO algorithm. Now, I don't understand how equations (14), (15) and (16) can guarantee that equations (3), (4), (5), (6) and (7) are fulfilled (particularly equations (5) and (6)). Concretely:

Equations (3) and (4) (Lagrangian stationarity) are clearly fulfilled regardless of equations (14), (15) and (16). This is because Lagrangian stationarity with respect to $w_i$ is forced during the derivation of the dual, and Lagrangian stationarity with respect to $\beta_i$ is guaranteed because SVMs have no equality constraints, so they only have Lagrange multipliers of type $\alpha$, not $\beta$.
Equation (5) (complementary slackness) would be guaranteed if only equations (14) and (16) existed and equation (15) did not. But the case of equation (15) violates complementary slackness, since neither the Lagrange multiplier $\alpha_i$ nor the inequality constraint are equal to $0$: $ \alpha_i = C \neq 0$, and $0 \leq g(x^{(i)}) = 1 - y^{(i)}(w^t x^{(i)}$, which means that $g(x^{(i)})$ could be different from $0$.
Regarding equation (6) (primal feasibility), one of the cases considered by the SMO algorithm is equation (15), which directly violates equation (6):

$\alpha_i = 0 \implies 0 \leq g(x^{(i)}) = 1 - y^{(i)}(w^t x^{(i)} + b)$

Equation (7) (dual feasibility) is also guaranteed by equations (14), (15) and (16).

So my question is: how are these two sets of conditions not contradictory, and how do the KKT conditions of the SMO algorithm guarantee primal feasibility and complementary slackness?

Thank you in advance,

Miguel

PS: This is my first question, sorry in advance for any mistakes.

elexhobby · Accepted Answer · 2018-09-06 01:10:52Z

0

What you are missing is another set of complementary slackness equations that are obtained due to the constraints $\xi_i \ge 0$.

Differentiating $L$ with respect to $\xi_i$, we obtain the first order condition $$C - \alpha_i - r_i = 0, \quad i=1,\dots,m.$$ This implies that $r_i = C-\alpha_i$. Now the complementary slackness equation for $\xi_i \ge 0$ is $$r_i \xi_i = 0 \Rightarrow (C-\alpha_i) \xi_i = 0.$$ If you use this complementary slackness equation, you can easily get equations (14), (15), (16).

answered Sep 6, 2018 at 1:10

elexhobby

1,59711 silver badges18 bronze badges

$\begingroup$ Thanks a lot! :) $\endgroup$
– mgarort
Commented May 11, 2019 at 12:59
$\begingroup$ Why the last equation yields the 3 conditions, though? Could you elaborate on that? $\endgroup$
– MathematicsBeginner
Commented May 10 at 19:03

Add a comment |

Stack Exchange Network

How does the SMO algorithm for SVMs guarantee the KKT condition "primal feasibility"?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
convex-optimization
machine-learning
duality-theorems
pattern-recognition
.

Hot Network Questions

How does the SMO algorithm for SVMs guarantee the KKT condition "primal feasibility"?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged convex-optimizationmachine-learningduality-theoremspattern-recognition.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
convex-optimization
machine-learning
duality-theorems
pattern-recognition
.