Hi, I am completing the third week of the course on supervised learning, and there was a lecture where we saw that we can different powers of x in linear regression to make the model fit more accurately. I didn’t really understand that part can someone explain it more in detail?
Thank you
Hello @kaki178925!
Are you talking about polynomials where we include higher-order terms in the linear function? It can help capture non-linear relationships between the input and the output. For example, a quadratic equation helps us fit the parabolic curve.
I don’t remember whether MLS course 1 covers the ReLU activation function or not. But it is a non-linear function and we mostly use it, instead of polynomials. I am sure you will learn more about it in MLS course 2.
Best,
Saif.
Hi @kaki178925
Sometimes fitting a streight line or a plane (w/ a linear model) is just not sufficient to model the relationship between your features and your label, especially if the relationship is a bit more complex.
In order to describe the non-linearity, feature engineering can do the trick which can involve:
- different powers of x as you mentioned
- features crosses, see also this thread: Example of encoding the non linearity using feature crossing - #2 by Christian_Simonis
- or some other kind of domain knowledge that you model into your features which is called feature engineering
When you have a manageable amount of features what always helps is to analyze the residuals to see where you have still systematic patterns left that you could potentially exploit in your feature engineering, see also:
- this thread: How to evaluate accuracy of a regression model - #21 by Christian_Simonis
- a coding example
In a perfect world you would just see some random (Gaussian) distribution in your residua and no systematic patterns:
Since your question also touches upon logistic regression, maybe this thread could be interesting for you, too: Can logistic regression be replaced with ordinary linear regression - #16 by Christian_Simonis
Hope that helps!
Best regards
Christian
Simple example:
Say you do an experiment where you drop a ball from a building, and record its height vs time.
You have only two pieces of data, the elapsed time ‘t’, and the height of the ball ‘d’.
From classical physics, the height of the ball is going to be related to the square of the elapsed time, via the equation d = 1/2 * a * t^2.
But your data data set doesn’t have t^2, it has only t.
If you try to create a linear model by plotting d vs. t, you won’t get a very good fit.
So it in order to get a more complicated model, you can create a new feature t^2, and train your system using both t and t^2.
Now you’ll get a perfect match. But if you didn’t know anything about physics, you would not know that the key insight was using t^2. You’d discover that by creating additional features.
Thank you for answering with detailed answers, I understand it now.