Getting labeled data from humans even when ML better than humans

am003e · February 5, 2023, 4:30am

On C3W1 slide 25, “Why compare to human level performance,” one benefit of ML being worse than humans is that you can “get labeled data from humans” in this case. However, I’ve worked with audio data for ML for a few years where you can still “get labeled data from humans” if your ML is better than humans. This is because of the simple fact that for problems like audio denoising, all of the “labeled” training data is generated by humans, even when the generated data need not be perceptible by humans in accordance with its label. Here’s an example.

Say we have a speech detection task. The detector should output “1” if it detects human speech in an audio segment and “0” if not.
To create training data, humans select audio of a human speaker saying “Hello World” and add it to a ton of noise, with an SNR of -10 dB. The human labeler attaches the label “1” to the audio segment and “0” to the surrounding audio which lacks human speech.
A human “labeler” listening to the recording cannot hear “Hello World” because there is so much noise. However, a ML algorithm trained to detect speech can possibly detect the speech even though it is imperceptible to a human listener.

See what we did there? We got “labeled data from humans” that a human could not label after the fact.

Therefore the first bullet point on this slide should be updated to say “Use humans to label existing data.” This way we exclude the very common case where humans generate labeled training data from scratch that does not exist in the wild.

reinoudbosch · February 15, 2023, 9:42pm

Hi am003e,

Great point!
I have made an issue on github suggesting that this be looked into for a future revision of the lecture / slides.
Thanks!

Topic		Replies	Views
How to get beyond human level performance? Structuring Machine Learning Projects coursera-platform	4	377	October 7, 2023
Surpassing human-level performance Q Structuring Machine Learning Projects coursera-platform	2	551	October 10, 2022
Training for better than human performance AI Discussions week-module-1	1	49	January 15, 2024
Measuring Human-level Accuracy Machine Learning in Production	3	635	October 27, 2021
Human Level Performance, how to set it? Structuring Machine Learning Projects coursera-platform	3	704	January 3, 2022

Getting labeled data from humans even when ML better than humans

Related topics