Question about lecture - Error analysis

Flimdejong · January 12, 2024, 12:24pm

Hello,

I have a question about the lecture in week 3 about Error analysis under “Machine learning development process”. Around the 5:13 mark, he says you can use more data and add more features to combat the misclassification of emails. Why is this the case?

I thought if the model misclassified the email it meant the model could not grasp the complexity of this problem, and thus it is a high-bias problem. And earlier he said a high bias problem is typically not solved by adding more data. I do understand why you would use more features.

Thanks,
Flim

gent.spah · January 12, 2024, 2:02pm

Adding more data means that the fitting has to adapt to these data as well so maybe it doesn’t overfit anymore!

Kic · January 12, 2024, 2:22pm

Hi @Flimdejong ,

This lecture on Error Analysis is showing a different method to diagnose a learning algorithm performance problem. Having a view of which categories of data are mis-classified most often, then, getting more data or engineering more features is a way to help the algorithm to learn and improve its accuracy.

Flimdejong · January 12, 2024, 4:17pm

how did you conclude it overfits?

TMosh · January 12, 2024, 7:59pm

That’s not conclusive analysis.

Flimdejong · January 12, 2024, 8:41pm

I can understand that yes. But that conclusion is from what I understood from the lectures.

TMosh · January 12, 2024, 8:42pm

You can just as easily get incorrect predictions from overfitting (high variance) or underfitting (high bias).

rmwkwok · January 12, 2024, 11:52pm

Hello @Flimdejong,

To begin with, I think you are right that adding more data wouldn’t help a high-bias problem, and that’s exactly why we should doubt whether the model misclassifying emails to be due to high-bias if someone said adding data helped.

If you watch the video again at 6:34, 7:00, and 8:00, Andrew had repeatedly mentioned variance too (especially 8:00). I recommend you to go through that video again and this one too. Sometimes, reviewing a video twice in another day can give us a different view For example, how would we diagnose a bias and a variance problem? This is a must-clearly-know.

Cheers,
Raymond

Flimdejong · January 13, 2024, 9:33am

Ah I totally missed that. I will go through the video again. Thank you for your response!

Topic		Replies	Views
Adding more data after error analysis Advanced Learning Algorithms week-3	1	497	September 22, 2022
Error Analysis, Adding Data Query Advanced Learning Algorithms week-3	5	504	February 12, 2023
Data augmentation questions Advanced Learning Algorithms week-3	5	510	July 17, 2022
Extra info about week2 - Cleaning up incorrectly label data Structuring Machine Learning Projects coursera-platform	5	541	December 29, 2022
Course 3 Week 2 - Cleaning Up Incorrectly Labeled Data Structuring Machine Learning Projects coursera-platform	1	524	October 7, 2022

Question about lecture - Error analysis

Related topics