Overall dev set error after fixing incorrectly labeled data

Dong_Zhang · October 25, 2023, 6:00pm

Dear Mentors,

To avoid direct reference to the quiz, I abstracted the question like this:
Overall dev set error:20%
Errors due to incorrectly labeled data: 5%
Errors due to other reason: 15%
Question: it is true that if we fix the incorrectly labeled data we will reduce the overall dev set error to 15%?
I think the answer should be True. Because fixing labeled data is different than fixing other reasons (like image quality, etc.). In this case, there is no overlap that some error is due to both mislabeled data AND other reasons (other wise the overall dev set error would not match the sum of all the reasons). The hint says it is an estimation of a “ceiling”, but in my opinion, by fixing the label in dev set, it is guaranteed that those 5% would be reduced to 0%, and the overall error would be 15%.
What am I missing here?

TMosh · October 26, 2023, 12:38am

Even if all the labels are correct, you will still be subject to the “other reason” errors at 15% of all of the examples.

So removing the 5% of bad labels only improves the results by 85% of those 5%.

Dong_Zhang · October 26, 2023, 5:16am

Hi TMosh,

Thanks for the quick feedback. I think you mean among those 5% mislabeled error, about 15% of them meanwhile caused by other errors. I.e. some missclassified samples are due to both mislabeling and other reasons. It makes sense.

How ever, the question is so formulated that the sum of all the errors equals the overall error, this indicates that there is no overlapping among errors, i.e. all the missclassified samples are caused only by 1 type of error, other wise the overall error should be smaller than the sum of all error.

TMosh · October 26, 2023, 5:51am

You are assuming more than the question contains.

Dong_Zhang · October 26, 2023, 12:44pm

All right, thank you.

Topic		Replies	Views
Incorrect Labelled Data Structuring Machine Learning Projects	1	555	October 23, 2021
Is special error not summing to overal errors? Structuring Machine Learning Projects	5	531	May 16, 2023
Assignment 2 Q11 Structuring Machine Learning Projects	2	572	July 24, 2021
Doubts for error analysis AI Discussions ai-discussions	3	33	August 9, 2024
Cleaning Up Incorrectly Labeled Data - ML Strategy \| Coursera Structuring Machine Learning Projects week-2	4	232	April 11, 2024

Overall dev set error after fixing incorrectly labeled data

Related topics