Heteroscedasticity, Multicollinearity, Diagnostics

Yassine_H · May 18, 2023, 5:18pm

Hello everyone,
I would like to express how confused I am about implementing Regression models using ML. As I have a statistics background, we usually use OLS to find the best regression model that includes only the significant variables (features) and drops unsignificant ones, we can easily do that by comparing the p-value with the significance level (Alpha). this is very understandable.
The problem I have is in the diagnostics part, in OLS regression we have to verify if some assumptions are valid or not, if they are valid then we can use the regression model to conduct predictions, some of these assumptions are :

Homoscedasticity : assumes that the variance of the residuals is constant across all levels of the independent variables.
No multicollinearity: Multicollinearity occurs when there is a high correlation between independent variables.
No endogeneity: Endogeneity refers to a situation where there is a correlation between the independent variables and the error term.

The course didn’t mention these assumptions, is that okay ? for now I’m confused, if ML is way different than traditional statistical models or if they are the same in the diagnostics part !!!

Thank you.

rmwkwok · May 18, 2023, 5:49pm

Hi @Yassine_H,

In my opinion, I believe the courses cover linear regression and logistic regression for the sake of introducing gradient descent which is crucial to the main dish - Neural network.

If you go through the menus of the courses, starting right from the beginning, we actually pretty quickly jump into gradient descent, learning rate, cost function, feature scaling and so on. They are all preparing us for Neural networks.

Given Neural network as the goal (again, in my opinion), I think a formal discussion of the traditional approach is not really quite the biggest force that moves us directly towards that goal.

Furthermore, the neural network approach is not identical to the traditional approach, but our focus should be on the former. Therefore, I think anyone who decided to learn about the traditional approach first might need to look for other courses, and then come back here later for a more neural network approach.

Welcome to the community, and cheers,
Raymond

TMosh · May 18, 2023, 5:55pm

Your concerns are not unusual for folks with a statistics background when they first approach ML. You’re used to applying “human learning”, but here the machine does most of that.

ML methods are much different than statistics, but they often reach a similar goal.

tennis_geek · May 19, 2023, 4:16am

@Yassine_H
Hi
I can understand the concerns raised.
A purist mathematical statistical modeling assumes your mentioned criteria and even more when optimizing the parameters especially temporal models.
ML on the other hand does take these into consideration (at a much later stage) but as mentioned by @rmwkwok consider this from pov of laying a foundation for optimization and eventual ‘learning’ of the model. Having said this, you can still incorporate data diagnostics from your own background and then try learning the model, shouldnt be a deterrent at all.
I had similar doubts when I started…I come from Temporal Spatial Mathematical modeling of Biological signaling systems. I kind of kept aside the concepts I am groomed with to understand ML basics as it is.

Topic		Replies	Views
Regression model assumptions Supervised ML: Regression and Classification week-1	2	582	June 18, 2022
Checking Assumptions for Linear and Logistic Regression Supervised ML: Regression and Classification week-3	1	478	January 28, 2023
When is Machine Learning necessary? Supervised ML: Regression and Classification week-1	4	179	May 19, 2024
Is Regression by Neural Networks included in the course? Advanced Learning Algorithms week-1	4	320	October 25, 2023
General questions Neural Networks and Deep Learning coursera-platform	3	614	July 2, 2021

Heteroscedasticity, Multicollinearity, Diagnostics

Related topics