C1W3_Data_Labeling_Ungraded_Lab

remimse · June 21, 2021, 8:45am

Short question regarding the ungraded lab for Week 3 of Course 1.

I understand that the objective of this Lab is to illustrate the impact of the labelling technique (performed on the training set) on the performance of the model (measured on the test set).

However I am not really sure about the usage of rules to label training data. Is it solely for the purpose of this lab or is it also a technique which is advised in real world problem?

If we find a rule to label our training data which allows us to have a good model performance on the test set, it means that our labelling is consistent with the test ground truth and that the model manages to ‘replicate’ the logics of the rule. So why is there a need to build a model? Why can’t we use the rules directly?

Thanks!

Great course with great content!

Remi

fabioantonini · June 21, 2021, 9:18am

Hi @remimse
welcome to our community!
My opinion is that the purpose of this lab is to remark the importance to have an effective data labeling process because it “affects the performance of a classification model”.
Further I want to remark that the final task of a model is to work fine in the real life with real (unseen) data when it has been deployed. So the performance of the model in the real world is shown by the accuracy of the model on the test set. Things in the real world are increasingly critical.
The more the data is well labeled, the more effective the model will be and higher the accuracy on the test set will be and better the predictive power of the model on unseen data in real world.
So a proper and effective data labeling process is definitely advised in real world problems.
Regards

remimse · June 22, 2021, 3:49am

Thanks @fabioantonini for answering my question right away.

Indeed a proper and effective data labelling process is necessary.

My question was more on the techniques used for labelling. Is using rules for labelling recommended as well or was it just for the purpose of the lab?

One could think that if the rules works well for labelling, then it could be used for prediction directly as a model replacement.

Thanks!

fabioantonini · June 22, 2021, 5:30am

Hi @remimse
Now I have fully catched your question.
In my opinion the rules for labelling are always recommended. They are not just for the purpose of the lab.
When the data are properly labelled the training of the model is more effective.
Anyway the rules cannot be intended as a replacement of the model for prediction because I think that they cannot generalize on unseen data. Instead a properly trained modem is able to generalize because it has been trained to learn patterns in the data. So what makes an ML model really powerful with respect to a set of rules is its ability to generalize and make inference on data never seen before.
Further the labellilng process often is done by humans with high skills on the subject and is really expensive. An inference engine properly trained is faster, more able to generalize and less expensive.
Hope this can help
Regards

remimse · June 24, 2021, 7:05am

I see.

Models can leverage on the labels provided by the rule for better generalisation.

Thanks.

Remi

Topic		Replies	Views
Course 1- week 3 - label consistency: unintelligible tag Machine Learning in Production	1	588	May 19, 2021
Understanding Nature of Problem in case where test data is not labeled AI Discussions	1	51	August 7, 2022
Course1: week2: Error analysis example Machine Learning in Production	2	582	May 20, 2021
Tossing out bad examples: Real world production data distribution AI Discussions ai-discussions , data-centric	8	421	August 11, 2021
Questions about automatically choosing model Advanced Learning Algorithms week-module-3	5	356	August 31, 2023

C1W3_Data_Labeling_Ungraded_Lab

Related topics