Exercise 4: compute accuracy

quicksilver · April 18, 2022, 8:19am

There are two ways to get the accuracy of the predictions (after taking care of padding):

Get mean of the accuracy of predictions from each sentence.
Consider the entire batch of sentences as one big set and calculate accuracy of that complete set in a single operation.

Both might give different numerical result. Which one is correct? I think the grader considers the second one as correct. But can someone clarify why that is the case.

Thanks!

SainiAnkit · April 18, 2022, 9:05am

For each sentence (after taking care of padding), you need to find the number of correctly predicted tags and divide this number by the total number of unpadded tokens. This will provide you the accuracy on a single sentence.

You can perform the above operation by iterating on each sentence or on the entire batch at once using vectorization. Both methods will provide you the same result.

quicksilver · April 19, 2022, 4:33am

I understand and it’s also the same as the second method that I mentioned in my original post. I’d like to know why (and if) the first method is incorrect. Let’s take a small example here:

matches = [[True True PAD],
[[True False False],
[[True False True]]

where True means the predicted label is the same as the target label and False means the predicted label is not the same as the target label. PAD is the part of sentence to ignore.

Now accuracy based on the two methods that we’re discussing:

mean accuracy from individual sentence accuracy
acc1 = 2/2
acc2 = 1/3
acc3 = 2/3
final_acc = (acc1 + acc2 + acc3)/3 = 0.67
accuracy of all the sentences as a whole
final_acc = 5/8 = 0.625

I can think of computation speedup as one reason why we’d prefer the second option but from the POV of correctness, I’m still not clear why we’d prefer the second option.

SainiAnkit · April 19, 2022, 6:25am

First method is correct and should be used to compute the accuracy on a single sentence as well as on the batch of sentences.

I think the second method is not the correct way to compute accuracy. I will discuss this with the team.

drew_Frances · May 15, 2022, 10:08pm

I am having problems with computing accuracy. I believe I have used argmax() properly. However I am having problems building the mask. I’m assuming when I build the mask, I am checking output against the pad? As a part of the error, I’m getting

Blockquote Wrong output: Pad token is being considered in accuracy calculation. Make sure to apply the mask…

What should I be reading as reference? I tried masking like the previous class assignment. What am I missing?

just in case, my lab id is whacbnpw

Cheers,
Drew

SainiAnkit · May 18, 2022, 3:26am

PAD token will have a unique index. You need to remove the effect of padding while computing accuracy.

You can practice masking by manually generating a random array of numbers and then masking different numbers to see the results. You can use NumPy for this.

drew_Frances · May 18, 2022, 5:54am

Hi @SainiAnkit

I looked the variables in a debugger. I believe I should be masking labels since I see padding there. Doing this gives the same values as the expected output. Still I very confused over how to compare the predictions to this mask. I’m looking at the previous assignment.

Cheers,
Drew

Topic		Replies	Views
C3 Assignment 3 E4 Problem with understanding evaluate_prediction NLP with Sequence Models week-module-3	9	674	November 8, 2023
What should be denominator in computing accuracy? NLP with Sequence Models week-module-3	3	568	November 2, 2022
C3 Week3 Assignment Exercise 4 NLP with Sequence Models week-module-3	3	645	July 28, 2022
C3W2: Exercise 2,5 and error NLP with Sequence Models week-module-2	6	798	August 22, 2024
C3W2 masked_accuracy NLP with Sequence Models week-module-2	9	561	February 24, 2025

Exercise 4: compute accuracy

Related topics