Question on Linear and logistic regression

Hi, I am completing the third week of the course on supervised learning, and there was a lecture where we saw that we can different powers of x in linear regression to make the model fit more accurately. I didn’t really understand that part can someone explain it more in detail?
Thank you

Hello @kaki178925!

Are you talking about polynomials where we include higher-order terms in the linear function? It can help capture non-linear relationships between the input and the output. For example, a quadratic equation helps us fit the parabolic curve.

I don’t remember whether MLS course 1 covers the ReLU activation function or not. But it is a non-linear function and we mostly use it, instead of polynomials. I am sure you will learn more about it in MLS course 2.

Best,
Saif.

1 Like

Hi @kaki178925

Sometimes fitting a streight line or a plane (w/ a linear model) is just not sufficient to model the relationship between your features and your label, especially if the relationship is a bit more complex.

In order to describe the non-linearity, feature engineering can do the trick which can involve:

When you have a manageable amount of features what always helps is to analyze the residuals to see where you have still systematic patterns left that you could potentially exploit in your feature engineering, see also:

In a perfect world you would just see some random (Gaussian) distribution in your residua and no systematic patterns:

Since your question also touches upon logistic regression, maybe this thread could be interesting for you, too: Can logistic regression be replaced with ordinary linear regression - #16 by Christian_Simonis

Hope that helps!

Best regards
Christian

1 Like

Simple example:
Say you do an experiment where you drop a ball from a building, and record its height vs time.

You have only two pieces of data, the elapsed time ‘t’, and the height of the ball ‘d’.

From classical physics, the height of the ball is going to be related to the square of the elapsed time, via the equation d = 1/2 * a * t^2.

But your data data set doesn’t have t^2, it has only t.

If you try to create a linear model by plotting d vs. t, you won’t get a very good fit.

So it in order to get a more complicated model, you can create a new feature t^2, and train your system using both t and t^2.

Now you’ll get a perfect match. But if you didn’t know anything about physics, you would not know that the key insight was using t^2. You’d discover that by creating additional features.

Thank you for answering with detailed answers, I understand it now.