Polynomial Regression with 'Linear' Gradient Descent: C1_W2_Lab04_FeatEng_PolyReg

In week 2, Lab4 we just add pow(x,2) and pow(x,3) column features to the ‘X’ matrix as we did linear features and then pass onto the gradient descent algorithm.

Could I clarify why this works even though this seems like a mistake since the GD has been derived assuming a linear base function f_wb(x), a derived cost function J(w,b) and w, b updates at each GD step applying partial deriviation on linear f_wb(x)?

Would the w3 update term corresponding to the pow(x,3) feature parameter w3, for instance, after partial deriviative wrt w3, just reduce to Alpha/m * Sum_i_1->m [(f(x_i) - y_i)(pow(x_i,3))] where the x_i**3 term is just substituted from the X mattrix as if it’s just a linear term?

So, in effect, the GD algorithm doesn’t need to be told that this or that feature is a non-linear term of a specific power?

A “linear function” does not mean that it forms only a straight line.

“Linear” in this context means the linear combination (addition and multiplication) of the products of the weights and features. There can be multiple features, and each will have its own weight.

Once you compute the squares and cubes of the feature values, they are really just new data. How you computed them has no impact on the process of linear regression.

2 Likes