I have a question on nomenclature. In these classes we learn about linear regression, where we are trying to predict a value y from a single feature x, and the model is of the form.

y = w * x + b

It’s clear why this particular form is called “linear” regression. y is linear with respect to x.

We also talk about adding polynomial features, so that the model might look like

y = w1 * x + w2 * x^2 + b

We still refer to this as a linear model, but the relationship between x and y is no longer linear.

My question is, when we add these new polynomial terms why don’t we refer to this as polynomial regression or use a different name? There must be some history on where the name comes from that I have missed.

“linear” refers not to the shape of the f_wb curve, but to the process of basing f_wb on the linear combination of weights and features. (f_wb = w*x + b). That’s a linear relationship between w and b and x.

In your example, x^2 is considered an additional engineered feature.

In your example, x^2 is considered an additional engineered feature.

Does this mean we would consider an example like “y = w1 * x + w2 * x^2 + b” to not be linear regression? Since there is no longer a linear relationship between w and b and x?

Maybe an example of something that is non-linear is what I am looking for.

It’s still linear regression, because all you’ve done is add another feature.
Once you compute x^2, it’s just a feature (a specific real value) just like any other feature.

Hi justin_Ko,
Your question is very interesting and this is an aspect that is not often explained. Once the feature x is instantiated (or mapped) with its value in the dataset, we have to see it as a simple value (forgetting the way it was created and the function applied). The variable in this equation is the weights that will evolve during the training phase.

y = w_1 \cdot x_1 + w_2 \cdot x_2 + b, which is a linear model y = X \cdot w

with your matrix X, consisting of your features

x_1 = f_1(x) = x and

x_2 = f_2(x) = x^2.

your features are well defined and you parametrise your weights by fitting the model

In general, f(x) could be any suitable nonlinear function or model for each feature to encode domain knowledge! This strategy would mean to model the nonlinearity of your modelling problem in your features. (Of course f(x) could also just be a linear function as in f_1(x)).

I was wondering the same question and found the lab4 of week 2 is quite helpful explaining the concern.
It is polynomials features and we still using linear regression model to train

An Alternate View
Above, polynomial features were chosen based on how well they matched the target data. Another way to think about this is to note that we are still using linear regression once we have created new features. Given that, the best features will be linear relative to the target. This is best understood with an example.

I still need to understand what is Non-linear regression to complete the picture though.

cos(x) might be a nice way to generate some new features, but you would not call them ‘y’.
‘y’ is used for the labels of the dataset.

Recall that when you’re doing regression, you just have some input features and some output labels. You don’t know how they were generated. So you don’t really want to try to specify the form of the output. You want to create more complicated features (from non-linear processes) so that the model can be more complex.