I am confused about 70% of labelers will use the first one and 30% of labelers will use the second one. Then if we have two labelers…

Are those two labelers different than the 70% and 30% above? How we came up with the 70,% and 30%.

For example I have this sentence " I feel your service is OK". I have 10 labelers. 6 label them as positive and 4 label them as negative. What will be the equation for HLP then?

Second question: what if have four possible labels to the same sentence say: positive, negative, indeterminate, neither and 10 labelers as above. How can I calculate HLP.

Thanks for your help

hi @Abdulrahman , welcome to the community!

70% is the probability that a labeler chooses the first representation as the transcript for the voice input. 30% is the probability that a labeler chooses the other representation.

When we have two labellers, the HLP is the probability two of them agree on the representation of an input voice, and calculated in the video as:

0.7*0.7 (two labellers agree it is first representation) + 0.3*0.3 (two labellers agree on the second) = 0.58

For your example, I can interpret it as 60% of the time or with a probability of 0.6 one labeler will mark the sentence as positive, and 40% of the time one mark it as negative. You can apply the same formula for two labelers as above or for more than two labelers you can sum up the probability they agree on the output.

For second question:

when you have more than two labels, after the labellers mark all the inputs, you will be able to calculate the ratio of how many labellers for each labels, i.e. you have a emprical probability for how likely a labeler will mark an input sentence. The HLP again will be the probability of all the labellers agree on one output. So you can apply the above formula:

let’s say the probability for 4 labels are : 0.6 0.2 0.1 0.1 and there are two labellers.

The HLP = 0.6*0.6 + 0.2*0.2 + 0.1*0.1 + 0.1*0.1 = 0.42

Hope it helps.

Cuong

2 Likes

Let me add some clarification on where the 70% and 30% come from, and I think that makes you confused. In my understanding, before you can train the model, you need to get your data labeled. You can hire people to do it. For example you hire 100 people to do the label, and 70 of them labeled the input using first representation, and 30 of them labeled the other representation. And that is where the number come from.

For HLP calculation, you then need to choose how many labelers to do the calculation, and here in the video example, there were two labelers.

Thank you so much @tranvinhcuong. A very articulated explanation. Makes sense to me. Appreciated.

Yes, the confusion is from where the probability was calculated. It seems a prior probability as you illustrated.

1 Like