# Problem understanding overfitting

Hello, there is something I don’t understand in the explanations about ‘overfitting’.

I don’t understand how we can get a polynomial expression from the gradient descent algorithm. It is supposed that the gradient descent applied to a case with only one variable would allow us to find the values of w and b.
If I’m not wrong, we would never get anything other than a straight line.

In the video called ‘Addressing overfitting’ a polynomial expression of degree 4 is shown. However, I do not understand how we can arrive at such a result with the algorithm we have been taught (gradient descent). We would always get a polynomial of degree 1.

Regards,

Hello @Thrasso00,

There are 2 steps to understand this. First, we need the multplie linear regression which is covered in course 1 week 2 and that it is about modeling with more than one feature.

Second, given that we can have more than one feature in a linear regression, even if we have got only one feature called x, we can manually create the second feature called x^2 by squaring our feature x, and similarly the third, forth, and fifth feature by calculating x^3, x^4, and x^5 respectively.

In this way, we have our original x, and self-created x^2, x^3, x^4, and x^5 and we use all five of them in our multiple linear regression y = b + w_1x + w_2x^2 + w_3x^3 + w_4x^4 + w_5x^5, or y = b + w_1x_1 + w_2x_2 + w_3x_3 + w_4x_4 + w_5x_5 where x_n = x^n.

Review week 2 if you are unsure about multiple linear regression

Cheers,
Raymond

The key thing to understand from Raymond’s explantion is that x, x^2, x^3, x^4 as features DID NOT come about as a result of Gradient Descent. These features were selected or created by us and then fed into the Gradient descent Learning Algorithm.

Given these features, Gradient descent then finds the optimal values of w1, w2, ...wn that serves as the coefficients for these features. If we choose to provide a different set of features with another combination of polynomials, then Gradient descent will find the weights for the new set of features provided by us.

So, the blame is on us for choosing a set of polynomial degrees as the features, which led to the wiggly shaped curve

Hello @rmwkwok , @shanup ,

Hello shanup ,

I have understood from the explanations given in week 2 that we have to be the ones to choose the new features in order to better fit the prediction.

There are still two things that are not clear to me:

1. Is this feature engineering something we do instinctively or do we have tools to help us?
2. Is there a demonstration of where the expression comes from?

Regards,

Hello @Thrasso00