Real world problems - multicollinearity and categorical data

Currently working with a dataset with the potential for linear regression. Something I have not seen covered in the course series is multicollinearity and handling of categorical data. Has anybody come across this? I have google searched, I have found information, but I’m curious as to how the community approaches this problems and any recommended resources for handling them.

Hi wclearly,

The way I see it multicollinearity is more a data science problem than a deep learning problem. When approaches are purely statistical, as in certain areas of data science (including linear regression), multicollinearity should be examined and possibly handled. You can probably find some discussions of multicollinearity in courses on data science with Python or R. When approaches are based on backpropagation, which is a mathematical optimization approach used in deep learning, multicollinearity constitutes less of a problem. Another reason that has been proposed for not having to deal with multicollinearity is that deep neural networks tend to be overparametrized (see, e,g., this discussion).
As for the handling of categorical data, object detection is treated in depth, as is sentiment analysis. But again, if you are looking for a truly statistical discussion of categorical data, I guess looking for a course in data science with Python or R is where you should be able to find this.

1 Like

Thanks for your response! Will read that discussion, and appreciate the insight.