Course 1- week 3 - label consistency: unintelligible tag

vothaiduong14 · May 18, 2021, 10:03am

Question1: effect of the use of ‘unintelligible’ tag on the model
For the hard-to-recognize sounds, why would we choose to use the ‘unintelligible’ tag instead of removing the example from the dataset? Is it because doing so would enable the model to return ‘unintelligible’ words when it faces similar unrecognizable sounds?

Question2: data point or noise?
From one of the previous video, the instructor said that consistent and clean data are important, especially for small dataset. So my questions are:

do we treat the hard-to-recognize recordings as noise?
suppose that we can only have a few labels, would it be a better to remove the recording?

satishnandi · May 19, 2021, 11:49pm

Q1 : The idea is to deal with cases where the model cannot clearly classify/detect. These can be dealt manually and add correct labels so in future the machine can improve its model.

Q2:When you have a large dataset, you afford to have few mistakes. It is not the case with smaller dataset, you want to have near perfect labels for your data. 1 incorrect of 100 is better than 1 incorrect of 10. It is a bigger percentage in a smaller dataset.

Topic		Replies	Views
How to Deal; with Unlabled dataset? AI Discussions ai-discussions	10	218	August 21, 2024
Course 3 Week 2 - Cleaning Up Incorrectly Labeled Data Structuring Machine Learning Projects coursera-platform	1	524	October 7, 2022
[C1W3] Consistent labeling and HLP Machine Learning in Production	1	553	May 18, 2022
Course 5 Week 4: clean_dataset() is buggy? in the Named-Entity Recognition notebook Sequence Models coursera-platform	7	570	April 6, 2022
Week 1 - Case Study Speech Recognition - Data Step Machine Learning in Production	3	629	May 15, 2021

Course 1- week 3 - label consistency: unintelligible tag

Related topics