Regularization in Regression

Rahull Trehan
2 min readMay 31, 2021

--

Regularization is one of the key technique which is used in Regression to address the overfitting and complexity issue if too many variables are present in the dataset. Again I would try to cover up the basic concept behind the regularization technique without getting into too many technicalities.

In Regression, if we have too many variables or coefficients we may end up having an overfitting model which is highly complex. It may have a lower training error but when it comes to testing error or prediction values this model would never perform well.

In order to handle such a dataset, wherein we have too many variables, we can use the regularization techniques.

The concept behind the regularization is to penalize a model with a high number of variables and in order to penalize we usually add the coefficients into the actual error we are getting and this penalized error is called a complexity error.

The main reason why we penalize a complex model is that in most of the scenarios simpler or less complex models have a tendency to predict better.

There are two types of regularization techniques:

L1 Regularization: in which we simply add the absolute coefficient values to the error.

L2 Regularization: very similar to L1 regularization, but we just square the coefficients in order to calculate the complexity error.

You may think that the regularization technique is not fair for a complex regression model and it will by default be the one with a higher total error in the majority of the cases.

There is another concept of lambda (𝜆) parameter which I have discussed in another article. This lambda (𝜆) parameter affects the punishment factor of complexity error and hence can end up having a lower total error value for a complex model as well.

I hope you liked this article. Thank you for reading.

--

--