4
$\begingroup$

I am reading about regularization in machine learning model. I want to understand how mathematically the L2 term penalizes the high-value weights to avoid overfitting? Any explanation?

$\endgroup$

2 Answers 2

11
$\begingroup$

Intuitively: if you have two ways of fitting your data, such as $y = 2x_1+0x_2$ or $y = x_1 + x_2$, you prefer the latter because the penalty is $2^2+0^2 = 4$ in the former and $1^2+1^2 = 2$ in the latter. In general the effect of each weight on the prediction will be linear but its penalty will be quadratic. Thus it will pay off to put lots of small values instead of just a few big ones.

Mathematically A good answer to your question can be found here. It basically explains how the square loss penalty can be seen as putting a Gaussian prior on your weights.

$\endgroup$
1
$\begingroup$

our goal is to minimize loss function, as we add sum of squares of the weights to our loss function is case of L2 reg, the more the weights are the more loss function is

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.