Measuring Human-level Accuracy

I always had this question in mind, how am I supposed to calculate human accuracy for a certain dataset?. For example in week two, in the speech recognition dataset example how should I know that the human transcriptionist did mistakes in classification, what is my reference to the right and wrong in this situation?

Hi kmlaawany,

Interesting question. From the point of view of ML, you always need to have a reference that you will consider to be the “truth”. So, when you develop your ML algorithm, you have to trust that your data is correct.

How to make sure that the data is correct, is a different matter. If you are still curious, I recommend that you take a look at some of the talks of Andrew NG on the topic ''From model-centric to data-centric AI" A Chat with Andrew on MLOps: From Model-centric to Data-centric AI - YouTube


If I take one of the course’s examples, image classification, the ground truth source is usually a human who labelled the data set manually in the first place. I’m assuming establishing a HLP baseline requires a second person to label the dataset a second time and HLP is defined as their level of agreement/disagreement?

I think this is a great question. One proxy to evaluate human-level accuracy is averaging the results of several expert humans to form ground truth. The idea being, that random errors will average out.

I still think this is a great topic to further discuss if anyone has more insights!