Skip to main content
added 57 characters in body
Source Link

I have a question in about a property of support vectors of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

enter image description here

The Lagrange (primal) function is:

enter image description here

Setting the respective derivatives to zero, we get:

enter image description here

The Karush–Kuhn–Tucker conditions include the constraints:

enter image description here

From (12.10) we see that the solution for $\beta$ has the form

enter image description here

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$$\hat\alpha_i < C$ is deduced? Really with attention to the $\alpha_i + \mu_i = C$ we needcan say $\mu_i > 0$ is necessary to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

I have a question in about a property of support vectors of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

enter image description here

The Lagrange (primal) function is:

enter image description here

Setting the respective derivatives to zero, we get:

enter image description here

The Karush–Kuhn–Tucker conditions include the constraints:

enter image description here

From (12.10) we see that the solution for $\beta$ has the form

enter image description here

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

I have a question in about a property of support vectors of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

enter image description here

The Lagrange (primal) function is:

enter image description here

Setting the respective derivatives to zero, we get:

enter image description here

The Karush–Kuhn–Tucker conditions include the constraints:

enter image description here

From (12.10) we see that the solution for $\beta$ has the form

enter image description here

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $\hat\alpha_i < C$ is deduced? Really with attention to the $\alpha_i + \mu_i = C$ we can say $\mu_i > 0$ is necessary to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

added 220 characters in body
Source Link

I have a question in about a property of support vectorsvector in SVMs of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

enter image description here

The Lagrange (primal) function is:

enter image description here

Setting the respective derivatives to zero, we get:

enter image description here

The Karush–Kuhn–Tucker conditions include the constraints:

enter image description here

From (12.10) we see that the solution for $\beta$ has the form

enter image description here

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

I have a question about property of support vectors in SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

enter image description here

The Lagrange (primal) function is:

enter image description here

Setting the respective derivatives to zero, we get:

enter image description here

The Karush–Kuhn–Tucker conditions include the constraints:

enter image description here

From (12.10) we see that the solution for $\beta$ has the form

enter image description here

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

I have a question in about a property of support vectors of SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

enter image description here

The Lagrange (primal) function is:

enter image description here

Setting the respective derivatives to zero, we get:

enter image description here

The Karush–Kuhn–Tucker conditions include the constraints:

enter image description here

From (12.10) we see that the solution for $\beta$ has the form

enter image description here

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

added 98 characters in body
Source Link

I have a question about property of support vectors in SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". BelowMy question is very simple and clear; but below, I try to provide a summary from thatthose pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

enter image description here

The Lagrange (primal) function is:

enter image description here

Setting the respective derivatives to zero, we get:

enter image description here

The Karush–Kuhn–Tucker conditions include the constraints:

enter image description here

From (12.10) we see that the solution for $\beta$ has the form

enter image description here

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

I have a question about property of support vectors in SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". Below, I try to provide a summary from that pages:

The optimization problem of finding decision hyperplane can be expressed as:

enter image description here

The Lagrange (primal) function is:

enter image description here

Setting the respective derivatives to zero, we get:

enter image description here

The Karush–Kuhn–Tucker conditions include the constraints:

enter image description here

From (12.10) we see that the solution for $\beta$ has the form

enter image description here

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

I have a question about property of support vectors in SVM which is stated in subsection "12.2.1 Computing the Support Vector Classifier" of "The Elements of Statistical Learning". My question is very simple and clear; but below, I try to provide a summary from those pages, so you will have a clear understand about the context:

The optimization problem of finding decision hyperplane can be expressed as:

enter image description here

The Lagrange (primal) function is:

enter image description here

Setting the respective derivatives to zero, we get:

enter image description here

The Karush–Kuhn–Tucker conditions include the constraints:

enter image description here

From (12.10) we see that the solution for $\beta$ has the form

enter image description here

with nonzero coefficients $\hat\alpha_i$ only for those observations $i$ for which the constraints in (12.16) are exactly met (due to (12.14)). These observations are called the support vectors, since $\hat\beta$ is represented in terms of them alone. Among these support points, some will lie on the edge of the margin ($\hat ξ_i = 0$), and hence from (12.15) and (12.12) will be characterized by $0 < \hat\alpha_i < C$.

What I want to know is: How $0 < \hat\alpha_i < C$ is deduced? Really we need $\mu_i > 0$ to be correct to deduce $\hat\alpha_i < C$ and that is not true in all cases; because there is no constraint for preventing both $\hat ξ_i = 0$ and $\mu_i = 0$ at the same time.

Source Link
Loading