Problem understanding overfitting

Thrasso00 · September 14, 2022, 8:41pm

Hello, there is something I don’t understand in the explanations about ‘overfitting’.

I don’t understand how we can get a polynomial expression from the gradient descent algorithm. It is supposed that the gradient descent applied to a case with only one variable would allow us to find the values of w and b.
If I’m not wrong, we would never get anything other than a straight line.

In the video called ‘Addressing overfitting’ a polynomial expression of degree 4 is shown. However, I do not understand how we can arrive at such a result with the algorithm we have been taught (gradient descent). We would always get a polynomial of degree 1.

I’m a bit confused about this aspect.

Regards,

rmwkwok · September 15, 2022, 1:09am

Hello @Thrasso00,

There are 2 steps to understand this. First, we need the multplie linear regression which is covered in course 1 week 2 and that it is about modeling with more than one feature.

Second, given that we can have more than one feature in a linear regression, even if we have got only one feature called x, we can manually create the second feature called x^2 by squaring our feature x, and similarly the third, forth, and fifth feature by calculating x^3, x^4, and x^5 respectively.

In this way, we have our original x, and self-created x^2, x^3, x^4, and x^5 and we use all five of them in our multiple linear regression y = b + w_1x + w_2x^2 + w_3x^3 + w_4x^4 + w_5x^5, or y = b + w_1x_1 + w_2x_2 + w_3x_3 + w_4x_4 + w_5x_5 where x_n = x^n.

Review week 2 if you are unsure about multiple linear regression

Cheers,
Raymond

shanup · September 15, 2022, 6:41am

The key thing to understand from Raymond’s explantion is that x, x^2, x^3, x^4 as features DID NOT come about as a result of Gradient Descent. These features were selected or created by us and then fed into the Gradient descent Learning Algorithm.

Given these features, Gradient descent then finds the optimal values of w1, w2, ...wn that serves as the coefficients for these features. If we choose to provide a different set of features with another combination of polynomials, then Gradient descent will find the weights for the new set of features provided by us.

So, the blame is on us for choosing a set of polynomial degrees as the features, which led to the wiggly shaped curve

Thrasso00 · September 18, 2022, 9:20pm

Hello @rmwkwok , @shanup ,

Hello shanup ,

I have understood from the explanations given in week 2 that we have to be the ones to choose the new features in order to better fit the prediction.

There are still two things that are not clear to me:

Is this feature engineering something we do instinctively or do we have tools to help us?
Is there a demonstration of where the expression comes from?

Regards,

shanup · September 18, 2022, 9:56pm

Hello @Thrasso00

The answers to your questions are:

It is a trial and error process. Having an awareness of the shape of a 2nd degree or 3rd degree polynomial would be good, But if the model needs to go beyond those degrees to be able to better fit the data, then there is no easy way out - Its a painful trial and error process! The alternative to this would be an advanced model like Neural networks, wherein we don’t need to do the feature engineering. The model will automatically find the optimal function that best fits the data.
This is the regularization term that we include in the cost function to try and reduce the impact of overfitting. This term tends to penalize the cost function for higher values of the weight values. Consequently, the weight values will be kept under control, curbing their tendency to assume higher values.

Thrasso00 · September 19, 2022, 5:06am

Dear @shanup,

It is now clearer to me.

Thank you for the clarifications.

shanup · September 19, 2022, 8:27am

You are most welcome @Thrasso00

Topic		Replies	Views
Gradient Descent for multiple feature linear regression Supervised ML: Regression and Classification week-module-2	12	962	November 16, 2022
Overfit undefit justright linear quadratic polynomial Supervised ML: Regression and Classification week-module-1	8	537	November 4, 2023
Practice quiz: Gradient descent in practice Q5 Supervised ML: Regression and Classification week-module-2	4	980	January 25, 2023
C1_W2_Lab04_FeatEng_PolyReg_Soln Supervised ML: Regression and Classification week-module-2	3	500	March 19, 2023
CW W2 Lab 4: Creating feature vs changing model Supervised ML: Regression and Classification week-module-2	14	476	May 20, 2023

Problem understanding overfitting

Related topics