C3 Assignment 3 E4 Problem with understanding evaluate_prediction

drew_Frances · May 25, 2022, 2:30pm

Hi:

I am stuck on figuring out accuracy in step #3. I don’t understand how the masking is used. I am looking at exercise 5 in assignment 3. I also do an mask = np.where(label != pad, ) to get the mask. I am not sure why a different approach is used in assignment 3, exercise 5.

Assuming I am creating the mask properly, I don’t understand how to apply the mask so I can make the comparison between the labels and the predictions. Really I need a simple explanation. There is something simple that I am missing.

lab id whacbnpw

Thanks
Drew

balaji.ambresh · May 25, 2022, 4:24pm

Each traning / evaluation batch is of shape (num_examples, max_len). max_len represents the length of the longest sentence in the batch.
Sentences whose length is less than max_len gets padded to fit the batch.

Model prediction is of form (num_examples, max_len, prob_per_ner_class).
When you find outputs, it has shape (num_examples, max_len)

accuracy=\frac{num\_equal}{num\_comparisons}

Upon making predictions, we want to compare positions that were not padded. This is what mask is used for.

Sakshi_Priya · May 25, 2022, 7:04pm

@balaji.ambresh Need help with how to do the masking. I am using mask = pred[:,:,1] == pad but getting all the values as false

balaji.ambresh · May 25, 2022, 7:41pm

pred contains probability of each class.
You have to use argmax to get the prediction classes i.e. outputs.

Use labels to get the mask. To check if each element in labels is equal to padding token id, labels == pad is sufficient.

drew_Frances · May 27, 2022, 3:09am

Hi @balaji.ambresh

Thanks for your help. I played around a bit and got the answer. It is how the mask interacts with the outputs and labels I didn’t quite get. And why the mask is the denominator and not the labels. I really have to look at it more so I fully understand.

Cheers,
Drew

Paresh_Dashore · June 17, 2022, 6:30am

@balaji.ambresh , then while computing accuracy why is it wrong if we compute accuracy using – np.sum(outputs * mask ==labels * mask)/np.sum(mask)

Why is the accuracy correct using [code removed -moderator]

The numerator for accuracy should compare non padded elements, which we will obtain by multiplying the mask

balaji.ambresh · June 17, 2022, 9:26am

@Paresh_Dashore

We don’t want to count the padded positions when calculating accuracy. Consider the example below. We want to compare only the first 3 positions. The numerator should be 2 and not 5. Here’s a block of code that should clear things for you:

pad_id = 3000
labels = np.array([1, 2, 4, 3000, 3000, 3000])
outputs = np.array([1, 2, 3, 10, 10, 10]) # this is the model outputs after argmax
print(mask) # array([ True,  True,  True, False, False, False])
print(labels * mask) # array([1, 2, 4, 0, 0, 0])
print(outputs * mask) # array([1, 2, 3, 0, 0, 0])
print(np.sum(labels * mask == outputs * mask)) # 5

The reason why your approach would work is because the output padded token will never be predicted by the model. The model predicts upto 17 labels whereas the padded token value is 35180. So, you are implicitly comparing upto padded lengths. Accuracy is in range [0, 1]. Hope this clears the purpose of mask.

zakharymg · November 7, 2023, 5:17pm

hello, I am completely lost at what the mask is doing, The assignment says that the mask needs to be the same size as the output ( I understand this since it has to be the same size of the body that has the text and padding) but when I tried mask = outputs == pad. It did not work but when I tried mask = labels == pad. It worked and I passed all tests, but how? the labels should not even have padding right? the padding is only something we add before feeding it to the model, so the target values we are comparing against should not be padded? or am I missing something here? I am completely lost.

balaji.ambresh · November 8, 2023, 4:11am

Adding @arvyzukai

arvyzukai · November 8, 2023, 7:04am

Hi @zakharymg

The mask helps with “real” accuracy - indicates where pad characters are so that we would not account for them.

Example sentence:
“The correct prediction <pad> <pad> <pad>”,

If we want to get the model’s accuracy, we need to compare only the words and not the <pad> tokens. So the 100% accurate model would predict 3/3 correct (not 6/6, nor 3/6 nor something else).

For example:

bad model’s prediction (outputs after argmax):
“The wrong words <pad> <pad> <pad>”
reasonably good model’s prediction (outputs after argmax):
“This is correct sentence <pad> <pad>”
True labels:
“This is the correct sentence <pad>”

I think this trivial example shows you why you should be counting pads in the labels but not in the outputs, if you want to get the accuracy - the 100% accurate model would get 5/5 correct, the reasonably good model would get 4/5 correct (but not 4/4).

Padding is needed for mini batch processing. If you want to feed a model more than just one sentence a time it needs to be structured as a matrix (where each column has a value (be it a actual value or a <pad> value)). This is the requirement for these types of models to work with mini batch processing (you cannot do matrix multiplication (inputs x weights) if one of the matrix (inputs) is not a matrix ).

Cheers

Topic		Replies	Views
Exercise 4: compute accuracy NLP with Sequence Models week-3	6	630	May 18, 2022
C3W2: Exercise 2,5 and error NLP with Sequence Models week-2	6	776	August 22, 2024
C3 Week3 Assignment Exercise 4 NLP with Sequence Models week-3	3	631	July 28, 2022
Assignment 2 - Named Entity Recognition (NER)_Exercise 5_masked_accuracy NLP with Sequence Models week-2	6	102	November 5, 2024
C3_W2_exercise5 NLP with Sequence Models week-2	1	175	June 7, 2024

C3 Assignment 3 E4 Problem with understanding evaluate_prediction

Related topics