I found the discussion about “Carrying Out Error Analysis” very interesting, but in the video only the classification problem is addressed. Is there some similar procedure for dealing with error analysis in the case of a regression problem?
For example, would it be correct to take a random sample of 100 items that have an error greater than the mean error?
Thank you in advance for your response!
Hey @sogatira,
Welcome to the community. Apologies for the delayed response. It’s indeed a nice question, and made me wonder at first. Let me present you with my opinion:
First of all, I am sure there must be some methods out there for performing error analysis for regression models as well. On performing a quick Google search, I came across these 2 libraries, dataiku and error analysis. I haven’t used them, but I am assuming from what I have read that these 2 libraries can help you out with performing error analysis for regression models.
Coming to my opinion, the method to “Take a random sample of 100 items that have an error greater than the mean error” sounds great to me. Another idea down the same lines could be to arrange the data-points in decreasing order of their errors, and take the top 100. One more idea comes to my mind, which could be applied in case of categorical features. Let’s say that we have a categorical feature rating
having 5 possibilities; 1, 2, 3, 4, 5
. In this case, we can find the category with the maximum mean error, and analyze examples from that category. I hope this helps.
Regards,
Elemento