In this Lab with heading Polynomial Features . We start with this equation 𝑦=1+𝑥2 and you got a linear line but doesnt that line correspond to the equation y=wx + b. Why did we use that quadratic equation to start with? The only thing that it helped with looks like is to generate y which is the train data. Then in the second part after that it is mentioned " What is needed is something like 𝑦=𝑤0x02 + b" but isnt that function same as the quadratic equation above with w=1 and b =1 but here in this case you generate y for the training data and then again square X so why wasnt X squared in the first case above?
Its confusing. I understand that you used feature engineering in the second case but it looks like we purposefully used train data y per the quadratic equation and then used X from a linear function to prove a point.
Also you used a quadratic function to just feature engineer but the model still uses a liner function isnt it? If you check the code for run_gradient_descent_feng you would eventually see it is using a cost function which is based on np.dot(X[i],w) + b ? Is that correct? Dont you also have to change that code?
As I read your post, I feel that you are actually on top of what’s going on in that lab. Perhaps what I can do here is to explain the intention behind what you have seen:
You are correct that we generate labels y with y = 1 + x^2 and then prove the need of engineering a new feature x^2 from x in order to fit the samples well to the labels. It looks like we had created a problem ourselves and then pretended we didn’t know x^2 in the first place and then we pretended to solve it by adding x^2 back. Is all of these “pretending” the source of your confusion?
If so, then I would say, yes - we are actually pretending, and the reason is really to, as you call, “prove a point”. In a real-world problem, the so-called “label generation process” y = 1 + x^2 should have been hidden from us, but we have chosen to reveal it to you so as to convince you that “if the generation process is non-linear, we need to engineer some polynomial features in order for the model to fit well”. It also showed how we have tried to visualize the non-linearity, and how we have decided that x^2 is better than x^3. The key here is how we visualize, because we have already known that x^2 is the best.
If you found it confusing, then what if you perhaps “pretended” that we hadn’t revealed y = 1 + x^2 to you in the first place, would it be less confusing? Because you would have to figure the x^2 term all by yourself.
Yes, it is still a linear function (of features that some of which are polynomials of the others), and that’s why we don’t have to change that code.
So essentially you are not changing the cost function or the gradient descent, you are just changing the data here by sending a feature which is engineered to the same model which looks like f(x) = wx +b? So Polynomial regression is essentially linear regression with engineered features?
Another probably silly question would be if wx+b denotes a straight line which is our model and also drives the cost function then how come the line curves when you send in additional data? Im having difficulty to wrap my mind around this.
We are taking y = w_1x + w_2x^2 + w_3x^3 + b as y = w_1x_1 + w_2x_2 + w_3x_3 + b. The former looks Polynomial, whereas the latter looks linear.
If we reverse-think what I have just said, that we reveal that y = w_1x_1 + w_2x_2 + w_3x_3 + b is actually y = w_1x + w_2x^2 + w_3x^3 + b, then the former looks linear and the latter looks polynomial, right?
In the former case, we can plot y against x_1, x_2, x_3, and although we can’t actually visualize it because it will be a 4D plot, the resulting “plot” is not curved, because y shall be linear to every dimension of x_1, x_2, and x_3.
In the latter case, however, we plot y against x, and it is curved because y is proportional to x^2 and x^3 as well.
You see the difference between what we are plotting? In the former case we plot y against x_1, x_2, x_3 where each and every one of them are linear to y. In the latter case we plot y against x and not every term in x is linear to y.