Revisions to Relation between values of $ξ_i$ and $\alpha_i$ in SVM?

added 57 characters in body

Source Link

edited Jun 26 at 18:33

485
5
16

I have a question in about a property of support vectors of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

The Lagrange (primal) function is:

Setting the respective derivatives to zero, we get:

The Karush–Kuhn–Tucker conditions include the constraints:

From (12.10) we see that the solution for $\beta$ has the form

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$$\hat\alpha_i < C$ is deduced? Really with attention to the $\alpha_i + \mu_i = C$ we needcan say $\mu_i > 0$ is necessary to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

I have a question in about a property of support vectors of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

The Lagrange (primal) function is:

Setting the respective derivatives to zero, we get:

The Karush–Kuhn–Tucker conditions include the constraints:

From (12.10) we see that the solution for $\beta$ has the form

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

I have a question in about a property of support vectors of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

The Lagrange (primal) function is:

Setting the respective derivatives to zero, we get:

The Karush–Kuhn–Tucker conditions include the constraints:

From (12.10) we see that the solution for $\beta$ has the form

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $\hat\alpha_i < C$ is deduced? Really with attention to the $\alpha_i + \mu_i = C$ we can say $\mu_i > 0$ is necessary to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

added 220 characters in body

Source Link

edited Jun 25 at 4:42

hasanghaforian

485
5
16

I have a question in about a property of support vectorsvector in SVMs of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

The Lagrange (primal) function is:

Setting the respective derivatives to zero, we get:

The Karush–Kuhn–Tucker conditions include the constraints:

From (12.10) we see that the solution for $\beta$ has the form

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

I have a question about property of support vectors in SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

The Lagrange (primal) function is:

Setting the respective derivatives to zero, we get:

The Karush–Kuhn–Tucker conditions include the constraints:

From (12.10) we see that the solution for $\beta$ has the form

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

I have a question in about a property of support vectors of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

The Lagrange (primal) function is:

Setting the respective derivatives to zero, we get:

The Karush–Kuhn–Tucker conditions include the constraints:

From (12.10) we see that the solution for $\beta$ has the form

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

added 98 characters in body

Source Link

edited Jun 24 at 18:35

hasanghaforian

485
5
16

I have a question about property of support vectors in SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". BelowMy question is very simple and clear; but below, I try to provide a summary from thatthose pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

The Lagrange (primal) function is:

Setting the respective derivatives to zero, we get:

The Karush–Kuhn–Tucker conditions include the constraints:

From (12.10) we see that the solution for $\beta$ has the form

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

I have a question about property of support vectors in SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". Below, I try to provide a summary from that pages:

The optimization problem of finding decision hyperplane can be expressed as:

The Lagrange (primal) function is:

Setting the respective derivatives to zero, we get:

The Karush–Kuhn–Tucker conditions include the constraints:

From (12.10) we see that the solution for $\beta$ has the form

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

I have a question about property of support vectors in SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

The Lagrange (primal) function is:

Setting the respective derivatives to zero, we get:

The Karush–Kuhn–Tucker conditions include the constraints:

From (12.10) we see that the solution for $\beta$ has the form

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

Source Link

asked Jun 24 at 18:26

hasanghaforian

485
5
16

Loading

Stack Exchange Network

Return to Question