Polynomial Regression

Hello, I just finished the first course of the specialization. I would like to clearly understand how Polynomial regression works. I get the concept of why we might need polynomial features, but I don`t understand how should I change my model to apply it in practice. If I have 4 features in LinearRegression model, my older model would look like this : w1x1+w2x2+w3x3+w4x4+b. Let’s say I decide to use x3**3 does this mean my new model will look something like this ? w1x3+w2x3^2+w3x3^3+b. Should I neglect x1,x2,x4 features in my new model? Also in LogisticRegression I noticed that uses all possible combinations of features and powers. So if I had 2 features and I decided I wanted polynomial degree of 6, I would have 27 features in a new model. I would also like to clarify how that works. Could you help me to understand this clearly ? or give me links to useful articles/videos.

Hello!
For the polynomial part: your old model is y=w_1x_1+w_2x_2+w_3x_3+w_4x_4, and you want to add an x_3^3 term, you can always generate a new x variable, say, x_5=x_3^3 and put it in the model, the new model would be y=w_1x_1+w_2x_2+w_3x_3+w_4x_4+w_5x_5, thus you still keep the features x_1, x_2, and x_4.
You mentioned noticing that in logistic regression they use all possible combinations of features and powers. I think this may be related to the extension of the Effect Hierarchy Principle, basically saying that higher-order effects are usually of less interest than the main effects or lower-order interactions.

1 Like

You can add as many new polynomial combinations as you need to get a more complicated model.

Adding just the power terms (x^2, x^3, etc) is a subset of those polynomial combinations.

The key trick is to add only as much complexity as you need to get 'good enough" performance for the problem you’re solving.

I am wondering, in ML, if one includes a higher order polynomial for a feature (e.g., X_1^3) in the chosen model, should one also include all the lower order for that feature (i.e., X_1^2, X_1) like one should do in statistics (i.e., y = x_1 + X_1^2 + X_1^3) or can one just have X_1^3 in the model (i.e., y = X_1^3)?

Hello Marie @Marie_T,

No, it is not necessary to keep the lower order. For example, it is completely fine to have y = x_1x_2^2 period.

Cheers,
Raymond

1 Like

If you don’t know which terms are going to be useful, it’s a good idea to use all the orders at least initially.

Hi @saba_odisharia,

To answer your second question, lets say you have two features x and y, and you want to add polynomial features up to degree 6, then you would have:

  • degree 1 terms x,y
  • degree 2 terms x^2,xy,y^2
  • degree 3 terms x^3,x^2y,xy^2,y^3
  • degree 4 terms x^4,x^3y,x^2y^2,xy^3,y^4
  • degree 5 terms x^5,x^4y,x^3y^2,x^2y^3,xy^4,y^5
  • degree 6 terms x^6,x^5y,x^4y^2,x^3y^3,x^2y^4,xy^5,y^6

In total that is 27 features. Does this clear things up?

Best,
Alex

1 Like