Regularization is used in machine learning, statistics, and inverse problems.
In this article, we will talk more closely about L1 or L2 regularization in machine learning. These techniques are widely used to solve problems that arise in machine learning and affect the learning performance of your model.
Understanding the fundamentals of linear regression
Before we look at the concepts of regularization, let’s touch on the topic of linear regression in ML.
When it comes to AI development services, myriad predictive features are based on a linear regression algorithm. Scoring, health prediction, weather prediction, and even sport: all of these events can be predicted with the help of linear regression algorithms.
The concept of linear regression says that we have some linear relationship between the outcome variable and other independent variables.
For example, let’s say that the outcome variable (x) is the grade a student got for his homework. This grade will depend on many independent variables (y): time spent on the task, resources, intelligence level, gender, nationality, and hundreds of other factors.
So in machine learning, you can predict the outcome of 1 event (x) depending on other independent (y) factors.
It’s worth understanding that this only works in one direction. If the teacher deliberately lowers the grade, the student’s effort will not diminish.
Now we’re getting closer to deciphering the concept of regularization.
What is regularization?
Regularization is many different methods that allow you to simplify the structure of a ML model in some way. So that it does not take into account too complex variables and thus does not overfit.
It allows developers to make the model more efficient and achieve better results.
But how does it work?
As mentioned above, ML model calculates the outcome of an event based on certain independent variables (factors). But some of these factors have a significant effect on the outcome, some do not.
Which factors should our model ignore, and which should it prioritize?
This is where regularization comes into play. It helps get rid of unnecessary information. In machine learning, this is called noise.
Apart from that, regularization helps us fight against overfitting.
What is overfitting?
There are two problems in machine learning:
1. Noise
Noise refers to insignificant data. You do not want to process this data but at the same time, you can’t ignore it.
For example, you study the effectiveness of teaching students in some universities. The gender of students will be considered as noise in that case. This is an unnecessary attribute that does not affect the calculations.
2. Overfitting
Before feeding your model with actual and relevant information, you need to train it. But it takes resources.
Ultimately, as soon as you bring new data into this model for analysis, it will show low efficiency.
For example, you train the model to analyze the performance of students to find out who will graduate with a degree. The model will analyze 50,000 questionnaires of former and current students. It will eventually produce predictions that are 99% accurate.
However, as you take this model and analyze liberal arts students, you’ll get predictions with 50% accuracy. The reason is that there was an oversaturation of information in training the model leading to its overfitting.
The problem with noise can be partially solved by applying appropriate filters to the data. However, the problem with overfitting is only solved by regularization.
In machine learning, regularization introduces a system of penalties for too large coefficients and helps to get rid of unnecessary noise.
L1 and L2 regularization
There are 3 types of regularization but will mainly cover L1 and L2 regularizations.
- The idea of L1 regularizationis to reduce the set of rules on the most important factors that affect the final result. This method looks like a way of selecting features.
It is expressed by the formula: L1 = Σ(yi — y(ti))2 + λΣ|ai|.
- L2 regularizationor Ridge regression has a somewhat similar approach. They both perform the same functions. However, the main focus of this method is an aggressive application of penalties. Practically it turns out that this method is not suitable for feature selection. This is its main difference from L1 regularization.
It is expressed by the formula: L2 = Σ(yi — y(ti))2 + λΣai2.
All types of regularizations use penalties. It is denoted by the Greek letter lambda λ. It is the penalty that reduces the amount of noise in the data.
What is the purpose of the penalty?
Large coefficients of independent variables (called weights) destabilize our model.
If there is a big difference between training and real data, your model will become unstable. To minimize this difference, we need to reduce weights by adding a regularization parameter.
Simply put, we say to our model “don’t increase the weights, even if it would reduce the error because we believe that the predictive potential of the model depends on the sum of the weight module”.
This type of optimization is used for L1 or Lasso regularization.
However, in L2 regularization (Ridge regression), the penalty is represented by the sum of the squares of the coefficients of the variables.
The idea is that we add a “ridge” to our error matrix, which greatly increases the error when you move away from certain values. In this way, we minimize the issue of large weights, as the error immediately starts to grow.
Wrapping up: what is the best way to overcome overfitting?
According to the post of James D. McCaffrey, research software engineer at Microsoft Research, there is no silver bullet that could help you with overfitting.
Both L1 and L2 regularization have their benefits and side effects.
L1 regularization cannot be used with certain ML algorithms, while L2 makes it possible.
Therefore, the choice of regularization technique strictly depends on a particular problem you need to solve.
About the author:
Robyn McBride is a journalist, tech critic, author of articles about software, AI and design. She is interested in modern image processing, tech trends and digital technologies. Robyn also works as a proofreader at Computools.