Short question regarding the ungraded lab for Week 3 of Course 1.

I understand that the objective of this Lab is to illustrate the impact of the labelling technique (performed on the training set) on the performance of the model (measured on the test set).

However I am not really sure about the usage of rules to label training data. Is it solely for the purpose of this lab or is it also a technique which is advised in real world problem?

If we find a rule to label our training data which allows us to have a good model performance on the test set, it means that our labelling is consistent with the test ground truth and that the model manages to ‘replicate’ the logics of the rule. So why is there a need to build a model? Why can’t we use the rules directly?


Great course with great content!


Hi @remimse
welcome to our community!
My opinion is that the purpose of this lab is to remark the importance to have an effective data labeling process because it “affects the performance of a classification model”.
Further I want to remark that the final task of a model is to work fine in the real life with real (unseen) data when it has been deployed. So the performance of the model in the real world is shown by the accuracy of the model on the test set. Things in the real world are increasingly critical.
The more the data is well labeled, the more effective the model will be and higher the accuracy on the test set will be and better the predictive power of the model on unseen data in real world.
So a proper and effective data labeling process is definitely advised in real world problems.


Thanks @fabioantonini for answering my question right away.

Indeed a proper and effective data labelling process is necessary.

My question was more on the techniques used for labelling. Is using rules for labelling recommended as well or was it just for the purpose of the lab?

One could think that if the rules works well for labelling, then it could be used for prediction directly as a model replacement.


Hi @remimse
Now I have fully catched your question.
In my opinion the rules for labelling are always recommended. They are not just for the purpose of the lab.
When the data are properly labelled the training of the model is more effective.
Anyway the rules cannot be intended as a replacement of the model for prediction because I think that they cannot generalize on unseen data. Instead a properly trained modem is able to generalize because it has been trained to learn patterns in the data. So what makes an ML model really powerful with respect to a set of rules is its ability to generalize and make inference on data never seen before.
Further the labellilng process often is done by humans with high skills on the subject and is really expensive. An inference engine properly trained is faster, more able to generalize and less expensive.
Hope this can help

I see.

Models can leverage on the labels provided by the rule for better generalisation.



1 Like