I have a question about one of the quiz items in week 1.
Please DO NOT READ further if you have not taken the quiz.
Remember your honor!!!
.
.
.
.
.
.
Okay. The quiz asked True or False: The only way to acquire data for a supervised learning algorithm is to manually label it. I.e., given the input A, to ask a human to provide B.
I answered false, correct answer was True… can someone explain this further? How could a supervised learning item extract insights from data that isn’t labeled by a human? Thank you!
@Scooter2000 Welcome to the community.
A supervised learning algorithm requires you to provide a set of inputs and output(s) for each training example. It’s the presence of the output parameters that make it supervised learning and not if they were labeled by humans. An example would be trying to estimate housing prices based on size of the house, number of floors, etc. where the training data doesn’t need to be labeled by humans.
2 Likes
Thank you! That does clear it up.
Hi @Scooter2000 , thanks for posting your doubt.
So I think that there is a bit of ambiguity in question that you have asked. So by above statement :
The only way to acquire data for a supervised learning algorithm is to manually label it. I.e., given the input A, to ask a human to provide B.
You answered it to be false, i.e you think that its not the only way of acquiring data for supervised learning. While the answer is TRUE.
So I would say that actually for acquiring data for supervised learning, we do need manual labelling and its actually the only way to do so. Why ? See basically supervised learning algorithms work on labelled data where input and output feature value is specified. For collection and developing dataset for supervised algorithms, we need output feature value in real world, which can be only done by humans i.e there is no other way. Most Companies that require such data often spend months and a lot of money for collection of such data and its labelling cause it is indeed done by humans by hand only. Its actually pretty obvious to see that machines will eventually learn from data that you are going to provide it. If you give wrong type and valued data, your model itself would not produce any beneficial outcome. Thus the only way left is that people have to correctly label data for machines to learn correctly.
One can also say that we can use synthetic data i.e to produce new data from data we already have, but still you need some initial data to start with, which indeed is a result of human labelling. Thus the above given statement is true. The question actually is asking that Is Human Labelling of data , the only way of acquiring data for supervised model and YES that is true.
I hope I have answered your query,
Thanks And Regards,
@Amit_Shukla
1 Like