Doubts for error analysis

Sai_Nruthik · August 8, 2024, 3:55pm

When training a neural network, we typically use training, development (dev), and test sets. We’ve identified some mislabeled data in these sets, and during training, we found that the proportion of mislabeled data in the training set was higher than in the dev and test sets. After correcting the mislabeled data during error analysis and retraining the model, I’m concerned that the training set might overshadow the improvements made to the dev set. Specifically, could the issues in the training set impact the model’s performance on the dev and test sets, and will the model’s ability to generalize be compromised?

gent.spah · August 9, 2024, 6:46am

What do you mean?

The training set is where the model learns from and if it learns wrong, the dev and test sets will get the wrong right!

How much does it affect it, it depends on the proportion of mislabeled and the criticality of your application.

Sai_Nruthik · August 9, 2024, 5:59pm

Yes, I’ve encountered in deep learning and neural network courses that correcting mislabeled data in the development (dev) and test sets, and then retraining the model, can sometimes be effective. However, this approach doesn’t always yield the desired results.

After conducting some research and consulting with colleagues, I’ve found a potential solution: collecting additional data related to the specific mislabeled examples and incorporating this new data into the dev set before retraining the model. This approach might address the issue effectively but could also be costly.

I’m interested in exploring alternative methods as well, given the potential expense of this solution. If you have any other suggestions or strategies for dealing with mislabeled data without incurring high costs, I would greatly appreciate your input.

TMosh · August 9, 2024, 8:12pm

Fixing mislabeled examples is just part of the data cleaning process.
It’s not really a Machine Learning issue.

Topic		Replies	Views
Overall dev set error after fixing incorrectly labeled data Structuring Machine Learning Projects coursera-platform	4	356	October 26, 2023
Week 2_error analysis Structuring Machine Learning Projects coursera-platform	2	573	May 25, 2021
Is special error not summing to overal errors? Structuring Machine Learning Projects coursera-platform	5	531	May 16, 2023
Assignment 2 Q11 Structuring Machine Learning Projects coursera-platform	2	576	July 24, 2021
Questions about different definition Structuring Machine Learning Projects coursera-platform	2	544	October 2, 2022

Doubts for error analysis

Related topics