Question1: effect of the use of ‘unintelligible’ tag on the model
For the hard-to-recognize sounds, why would we choose to use the ‘unintelligible’ tag instead of removing the example from the dataset? Is it because doing so would enable the model to return ‘unintelligible’ words when it faces similar unrecognizable sounds?
Question2: data point or noise?
From one of the previous video, the instructor said that consistent and clean data are important, especially for small dataset. So my questions are:
- do we treat the hard-to-recognize recordings as noise?
- suppose that we can only have a few labels, would it be a better to remove the recording?