I understand about using mask to get rid of pad tokens and keep only actual predictions. So the number of correct predictions should be np.sum(outputs * mask == labels * mask). But what should be the total number of predictions then? I tried np.sum(mask), but got a wrong accuracy, larger than 100%.
Printing out my number of predictions shows that it is smaller than the number of correct predictions, so using np.sum(mask) should be wrong. But why? What should be the correct answer then?
My code, FYI:
mask = (labels != pad)
n_correct = np.sum(outputs * mask == labels * mask)
n_prediction = np.sum(mask)
print("no. of correct predictions:", n_correct)
print("total actual predictions:", n_prediction)
accuracy = n_correct / n_prediction