Addressing Overfitting

Thala · July 10, 2022, 4:00pm

How exactly adding more training examples can help to reduce overfitting?

Sayed_Mahmoud · July 10, 2022, 4:12pm

By adding new data but without increase the polynomial degree of parameters the regression line or the Prediction line will be smoother as it will train on this data so the parameters or weight will change

rmwkwok · July 11, 2022, 1:23am

Hi @Thala, let’s illustrate this! In below, from top to bottom, I plotted 3 types of dataset: (1) noise-free data; (2) noise-added data; and (3) more noise-added data.

In (1), noise-free data. the simplest model we need to model this data is y = w_1x + b, but even if we over-parameterize our model such as using y = w_1x + w_2x^2 + w_3x^3 + b, we still won’t overfit and after training, we will get w_2 \approx w_3 \approx 0 which is equivalent to the simplest model.

In (2), noise-added data. now with our over-parameterized model y = w_1x + w_2x^2 + w_3x^3 + b, it does the best to fit the model to the data including those points that are very deviated, resulting in a curve. However, such a curve is unwanted because the true underlying model should be a line (the dashed line), but because of the noise, it becomes a curve and therefore we are fitting the model to the noise! Here, we overfit the data.

In (3), more noisy data. This time, our over-parameterized model y = w_1x + w_2x^2 + w_3x^3 + b looks better and closer to a line. Why? Since our noise is random, with more data, it’s more likely to find data points to “balance” those highly deviated points.

For example, whenever the model wants to bend away to the left because there is a very deviated point on the left hand side of the dashed line (our true model), the model sees another point that is also very deviated BUT it is on the right hand side, so the model can’t bend too much away to the left because it needs to take care of the one on the right as well, so it has to stay somewhere in the middle, which is closer to the dashed line which is our true model! Here, we are much less overfitting to the noise and overfitting is reduced!

Cheers!
Raymond

Topic		Replies	Views
Add more Training Data to prevent overfitting Supervised ML: Regression and Classification week-3	2	473	January 12, 2023
C1_W3_Lab08_Overfitting_Soln Supervised ML: Regression and Classification week-3	4	490	December 15, 2022
Problem understanding overfitting Supervised ML: Regression and Classification week-3	6	534	September 19, 2022
Can Removing Random Training examples in Classification Problems lead to a better Generalised Fit? Supervised ML: Regression and Classification week-3	6	557	July 21, 2022
Overfitting Supervised ML: Regression and Classification week-3	3	405	August 2, 2023

Addressing Overfitting

Related topics