Week 3: Improving HLP for the tasks where it is hard to create labeling instructions

gsasikiran · June 11, 2021, 7:21pm

How can we improve HLP for labeling unstructured text data for which it is hard to enumerate all the labeling instructions?

For example, we have a short answer grading dataset, where one grader can assign 4/5 and the second being lenient to give 5/5. In this case there can be so many factors (Example: grammar, topics answered, terms answered and so on) that can affect a grade, particularly when multiple graders assigning the grade. Also, it is necessary to attain labelers who have similar level of understanding and skills. Even if there are labeling instructions, it doesn’t make sense to stick to these cases as there can be multiple ways to provide a correct answer. In the cases like these, what are the best approaches to improve HLP?

tranvinhcuong · June 15, 2021, 3:25am

Hi @gsasikiran , welcome to the course!

In my opinion, if we cannot have good instructions or the labelers cannot follow the instructions then there is not much we can do here to improve the HLP.
A more practical approach is to receive feedback from production deployment and then do the analysis (tagging which causes bad performance like in the course) then improve the data labeling iteratively.

There may be a better solution, but that’s all I can offer.
Cuong

Topic		Replies	Views
[C1W3] Consistent labeling and HLP Machine Learning in Production	1	557	May 18, 2022
Human level performance c1w3 Machine Learning in Production	4	572	June 10, 2021
Surpassing human-level performance Q Structuring Machine Learning Projects coursera-platform	2	547	October 10, 2022
C1W3_Data_Labeling_Ungraded_Lab Machine Learning in Production	4	601	June 24, 2021
Automating labeling process for supervised learning AI Discussions ai-discussions , data-centric	1	68	May 16, 2023

Week 3: Improving HLP for the tasks where it is hard to create labeling instructions

Related topics