How Do We Best Train For Noisy Data? (Can A Staged Training Strategy for Noisy Data Improve Results?)

There is a recommendation to improve the model by tossing out bad/noisy data. However, some data that we are asking our model to act on is noisy (e.g., voice data from a restaurant, out-of-focus family photos, etc.) What data-centric process do you recommend so models train efficiently, yet can handle noisy data?

I have sometimes noticed that it makes sense to use a staged training process that parallels how children learn. Perhaps the first stage focuses on the best training data (leading to the fastest training results. Then, the next stages include progressively noisy data (interspersed with the stage one data to keep the strength of the model’s efficacy in terms of “clean” data.) Finally, the last stages incorporate some examples of the noisiest data, whilst interspersing the better (less “noisy”) data.

Do you agree with this approach? Have you seen this yield benefits in your own work?

Thank you!