How can we improve HLP for labeling unstructured text data for which it is hard to enumerate all the labeling instructions?
For example, we have a short answer grading dataset, where one grader can assign 4/5 and the second being lenient to give 5/5. In this case there can be so many factors (Example: grammar, topics answered, terms answered and so on) that can affect a grade, particularly when multiple graders assigning the grade. Also, it is necessary to attain labelers who have similar level of understanding and skills. Even if there are labeling instructions, it doesn’t make sense to stick to these cases as there can be multiple ways to provide a correct answer. In the cases like these, what are the best approaches to improve HLP?