7
$\begingroup$

I'm using this Java machine learning library: https://sites.google.com/site/qianmingjie/home/toolkits/laml

From the library I'm using Logistic Regression: http://web.engr.illinois.edu/~mqian2/upload/projects/java/LAML/doc/ml/classification/LogisticRegression.html

This class supports 4 types of regularizations:

  • 0: No regularization
  • 1: L1 regularization
  • 2: L2^2 regularization
  • 3: L2 regularization
  • 4: Infinity norm regularization

You basically create an object of Regular Regression using this code:

int regularizationType = 1;
double lambda = 0.1;
Classifier logReg = new LogisticRegression(regularizationType, lambda);

When I tried it I noticed this weird thing:

As far as I know the idea of regularization is to have the weights as small as possible and so using lambda will penalize large weights. So one should use a large lambda to regularize. However, when I used L1 regularization with a lambda=1 the performance was worse than using lambda=0.0001. Actually the best performance I got is when I used lambda=0!

My questions:

1- How can logistic regression without regularization perform better than when using regularization? Isn't the idea of regularization after all is to make the performance better?!

2- Should I use large values for the regularization parameter?!

3- Is using regularization in general always good?

$\endgroup$
2
  • $\begingroup$ How can using question without grammar is better? $\endgroup$
    – Octopus
    Commented May 15, 2015 at 21:16
  • $\begingroup$ Not all regression is about predictive performance. $\endgroup$ Commented May 15, 2015 at 23:11

1 Answer 1

15
$\begingroup$

As far as I know the idea of regularization is to have the weights as small as possible and so using lambda will penalize large weights.

Deep down, regularization is really about preventing your weights from fitting the "noise" in your problem, aka overfitting. If you have more noise (i.e. as measured by the standard deviation of the noise distribution), then you will need more regularization to prevent overfitting. It's not really about keeping weights small.

So one should use a large lambda to regularize.

With regularization, it's best to avoid such definite statements. Sometimes bigger is better, sometimes not.

However, when I used L1 regularization with a lambda=1 the performance was worse than using lambda=0.0001. Actually the best performance I got is when I used lambda=0!

By my reasoning above, it is not true that bigger lambda => better performance. It depends on the noise level, among other things. In fact, you can always set lambda = 1000000 and all your weights will be zero. Choosing lambda correctly can be somewhat of a subtle art.

To your questions:

1- How can logistic regression without regularization perform better than when using regularization? Isn't the idea of regularization after all is to make the performance better?!

More often than not, regularization will improve the performance of your model. It sounds to me like you're considering one specific application and/or dataset, in which case it is very possible that regularization doesn't help for this specific problem.

However, without knowing what you mean by "better performance", it's hard to tell. What have you done to test the generalization performance of your model? lambda = 0 is always gonna to perform better on the training data, but what you should care about is the performance on test data.

2- Should I use large values for the regularization parameter?!

See above - this is somewhat of an art and you need to balance it with the noise level in your specific problem. Are you familiar with / have you tried techniques such as cross-validation for selecting hyperparameters?

3 - Is using regularization in general always good?

See answer to 1).

$\endgroup$
4
  • $\begingroup$ Good point about $\lambda = 0$ on the training data. $\endgroup$ Commented May 15, 2015 at 17:33
  • $\begingroup$ I split my data into two parts: training %80 and testing %20. I trained my classifier on the training set and tested the performance on the test set. $\endgroup$
    – Jack Twain
    Commented May 17, 2015 at 17:50
  • $\begingroup$ So my trained classifier achieved the best performance on the test set when the classifier was trained with lambda=0. $\endgroup$
    – Jack Twain
    Commented May 17, 2015 at 17:51
  • $\begingroup$ Did you try multiple, random 80/20 splits? If so, did lambda = 0 always perform the best? Without knowing more about your data and your code it's hard to say anything definitive. $\endgroup$
    – JohnA
    Commented May 18, 2015 at 13:51

Not the answer you're looking for? Browse other questions tagged or ask your own question.