How does the L2 regularization penalize the high-value weights

Question

I am reading about regularization in machine learning model. I want to understand how mathematically the L2 term penalizes the high-value weights to avoid overfitting? Any explanation?

Community · Accepted Answer · 2017-04-13 12:44:36Z

11

Intuitively: if you have two ways of fitting your data, such as $y = 2x_1+0x_2$ or $y = x_1 + x_2$, you prefer the latter because the penalty is $2^2+0^2 = 4$ in the former and $1^2+1^2 = 2$ in the latter. In general the effect of each weight on the prediction will be linear but its penalty will be quadratic. Thus it will pay off to put lots of small values instead of just a few big ones.

Mathematically A good answer to your question can be found here. It basically explains how the square loss penalty can be seen as putting a Gaussian prior on your weights.

edited Apr 13, 2017 at 12:44

CommunityBot

1

answered Nov 26, 2016 at 6:26

etal

5712 silver badges11 bronze badges

Add a comment |

Sengiley · Accepted Answer · 2016-11-26 05:31:10Z

1

our goal is to minimize loss function, as we add sum of squares of the weights to our loss function is case of L2 reg, the more the weights are the more loss function is

answered Nov 26, 2016 at 5:31

Sengiley

3763 silver badges11 bronze badges

Add a comment |

Stack Exchange Network

How does the L2 regularization penalize the high-value weights

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
machine-learning
regularization
loss-functions
log-loss
or ask your own question.

Linked

Hot Network Questions

How does the L2 regularization penalize the high-value weights

2 Answers 2

Not the answer you're looking for? Browse other questions tagged machine-learningregularizationloss-functionslog-loss or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
machine-learning
regularization
loss-functions
log-loss
or ask your own question.