Do we maximize the Lagrangian function or minimize it for the SVM derivation? And why?

Question

These are the resources I've looked into:

The professor in (1) at 25:47 says that "I don't care if we find the maximum or minimum (of the Lagrangian function) .... we'll just find the extremum..."

Accepted answer in (2) tries to imply that we're trying to minimize the (primal) Lagrangian.

In (3), slide 14 explicitly mentions that we're trying to minimize the function

In (4) at 24:30 the professor mentions that we'll just set the derivative for the Lagrangian function to zero with respect to the hyperplane parameters ($\boldsymbol{w}$ & $b$) (this is of course, how you find the extremum)

In (5) at 33:09 the professor again mentions that we're minimizing the function with respect to $\boldsymbol{w}$ and $b$ but maximizing with respect to the Lagrange multipliers

Question: I'm still not clear/convinced about if or not the Lagrangian function i.e.

$$\mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\alpha}) = \frac{1}{2}\|\boldsymbol{w}\|^2 + \sum_i \alpha_i\left[y_i(\boldsymbol{w}^T\boldsymbol{x}+b)-1\right].$$

needs to be minimized or not? (it looks like it does, in which case - why?)

What I understand is, the method of Lagrange multipliers lets you find the extremes of an objective function when you set the Lagrangian function to zero. But how do we know if we're minimizing the function or not?

Also not sure why when you set those values to zero and substitute the result from it back to the function, why does the function need to be maximized now instead of minimizing?

LinAlg · Accepted Answer · 2019-02-28 17:40:48Z

4

The primal SVM problem is: $$\min \left\{\frac{1}{2}\|\boldsymbol{w}\|^2 : y_i(\boldsymbol{w}^T\boldsymbol{x}+b)\geq1 \right\}.$$ Therefore, the Lagrangian should also be minimized. Since the Lagrangian is convex in $w$ and $b$, a critical point corresponds to a global minimum.

What is so beautiful about Lagrange duality is that if the primal is feasible (or strictly feasible if the constraints are nonlinear), then: $$\min_{w,b} \max_\alpha \mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\alpha}) = \max_\alpha \min_{w,b} \mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\alpha}).$$ On the left hand side you have the primal problem, on the rhs is the dual problem. This property is called strong duality (and the feasibility condition is called Slater's condition). Strong duality requires the problem to be convex.

answered Feb 28, 2019 at 17:40

LinAlg

19.9k2 gold badges15 silver badges33 bronze badges

$\begingroup$ Ah I think this is what I was missing > Since the Lagrangian is convex in 𝑤 and 𝑏, a critical point corresponds to a global minimum. I'll think some more before marking your answer as accepted. Thanks for the quick response. $\endgroup$
– GrowinMan
Commented Mar 1, 2019 at 7:07
$\begingroup$ Thank you for answering my question. I just asked another question here and would appreciate it if you could take a look: math.stackexchange.com/questions/3133125/… $\endgroup$
– GrowinMan
Commented Mar 3, 2019 at 3:33

Add a comment |

MachineLearner · Accepted Answer · 2019-02-28 10:12:58Z

The Lagrangian needs to be minimized. The $1/2||\boldsymbol{w}||^2$ is an inverse measure for the margin. The goal of the SVM formalism is to maximize the margin $$2/||\boldsymbol{w}||$$ such that the decisions become more robust (see 20:30 in MIT Lecture). This is equivalent to minimizing $||\boldsymbol{w}||$ or minimizing $1/2||\boldsymbol{w}||^2=1/2\boldsymbol{w}^T\boldsymbol{w}$. Normally we write the Lagrangian as

$$\mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\alpha}) = \frac{1}{2}\|\boldsymbol{w}\|^2 - \sum_i \alpha_i\left[y_i(\boldsymbol{w}^T\boldsymbol{x}+b)-1\right].$$

Note, the negative sign in front of the sum. When writing the Lagrangian in this form we can assert that $\alpha_i\geq 0$ for constrained optimization. The negative sign is because we want to minimize $1/2||\boldsymbol{w}||^2$ but at the same time maximize the inequality constraint components $y(\boldsymbol{w}^T\boldsymbol{x}+b)\geq 1$ as much as possible. Hence, we multiply the sum by $-1$ such that in total we are only minimizing the Lagrangian.

Hi, thanks for answering my question. It makes sense that we're trying to maximize $y(w^Tx + b)>= 1$ and that helps minimize the Lagrangian, but what I was getting at was that solving the Lagrangian could give you the minimum or the maximum value for the margin - how do we know it's the minimum? I think the fact that the L2 norm is a convex function with a global min at the critical point as @LinAlg pointed out really cleared it for me. — GrowinMan, Commented Mar 3, 2019 at 3:37

Stack Exchange Network

Do we maximize the Lagrangian function or minimize it for the SVM derivation? And why?

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
optimization
lagrange-multiplier
machine-learning
.

Linked

Hot Network Questions

Do we maximize the Lagrangian function or minimize it for the SVM derivation? And why?

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged optimizationlagrange-multipliermachine-learning.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
optimization
lagrange-multiplier
machine-learning
.