In linear regression do we care about independent variable distribution?

Tom_Niesytto · December 24, 2022, 6:17am

Letsay most of the values cluster around similar set of values - and there are few values outside. Cost function averages square error so even large contribution from outliers will not affect it. So the line will most like predict wrong values once we are try to use it from a cluster.

TMosh · December 24, 2022, 6:30am

One solution to this issue is to create a more complex set of features (using non-linear combinations, or exponents), so that those clusters can be more closely modeled.

shanup · December 24, 2022, 7:10am

As @TMosh has mentioned, we can have more complicated set of features to be able to capture those outliers.

On the flip side, lets also keep an eye on the aspect of overfitting - Should these outliers be considered as anomalies and ignored in the model OR are they worthy of the extra effort of having a more complicated model, such that the model can correctly predict these outliers as well.

Topic		Replies	Views
Issues with Large Values in the Cost Function Supervised ML: Regression and Classification week-1	4	491	October 7, 2022
CW W2 Lab 4: Creating feature vs changing model Supervised ML: Regression and Classification week-2	14	476	May 20, 2023
Need help grasping intuition behind square error cost function and multi-variable regression model Supervised ML: Regression and Classification week-1	3	516	May 1, 2023
How to handle outliers? Supervised ML: Regression and Classification week-2	14	697	November 9, 2022
Mathematical proof for the cost function Supervised ML: Regression and Classification week-1	3	697	June 21, 2022

In linear regression do we care about independent variable distribution?

Related topics