Course 3 Week 2 - Cleaning Up Incorrectly Labeled Data

Mohammadreza_azizi · October 7, 2022, 10:11am

So Andrew says :

I would urge you to consider examining

examples your algorithm got right as well as ones it got wrong.

It is easy to look at the examples your algorithm

got wrong and just see if any of those need to be fixed.

But it’s possible that there are some examples that you haven’t got right,

that should also be fixed.

And if you only fix ones that your algorithms got wrong,

you end up with more bias estimates of the error of your algorithm.

It gives your algorithm a little bit of an unfair advantage.

If you just try to double check what it got wrong but you don’t also

double check what it got right because it might have gotten something right,

that it was just lucky on fixing the label would

cause it to go from being right to being wrong, on that example.

If we are analyzing errors, we are focusing on those examples (on test or dev set) that the label is different from prediction. So, every example that falls into this category is wrongly classified. I wonder what does Andrew mean by saying

examples your algorithm got right

Mubsi · October 7, 2022, 1:44pm

Hi @Mohammadreza_azizi,

I believe what Andrew meant here is what he is illustrating at the start of the lecture video as well, that, as an example he showed a dog picture with the label 1; meaning it is a cat, that maybe sometime there are “mislabelled” images in your dataset, and your algorithm “correctly mislabels” them.

What I mean by that is, even though the picture is of a dog and the ground truth label on it is 1, meaning it is a cat (the label is factually incorrect), and your algorithm predicts the label 1 on it as well. Even though the algorithm predicted the label “correctly”, but in reality this is a mislabelled picture.

The ground truth label and the predicted label should have been 0.

So Andrew urges that you should also check the “correctly” predicted labels as well to make sure they are factually correct.

As at the end of the quotation you shared, double check your algorithm to make sure when you change a “mislabelled” image to the correct label, the algorithm should predict it factually correct as well.

Hope this helps and answers what you were hoping for,
Mubsi

Topic		Replies	Views
Cleaning Up Incorrectly Labeled Data Structuring Machine Learning Projects coursera-platform	2	561	October 10, 2022
Cleaning Up Incorrectly Labeled Data - ML Strategy \| Coursera Structuring Machine Learning Projects week-module-2 , coursera-platform	4	234	April 11, 2024
Extra info about week2 - Cleaning up incorrectly label data Structuring Machine Learning Projects coursera-platform	5	541	December 29, 2022
Incorrect Labelled Data Structuring Machine Learning Projects coursera-platform	1	555	October 23, 2021
Suggestion for the Cleaning up Incorrectly labeled data Structuring Machine Learning Projects coursera-platform	7	566	November 12, 2021

Course 3 Week 2 - Cleaning Up Incorrectly Labeled Data

Related topics