This unit test indicates that you did not complete the function masked_accuracy
correctly.
Your function failed on 5 tests. For example, take the first check:
# true labels (y_true)
[array([[ 0, 2, -1, -1, 2]]),
# model's predictions (y_pred)
array([[[0.12812445, 0.99904052, 0.23608898],
[0.39658073, 0.38791074, 0.66974604],
[0.93553907, 0.84631092, 0.31327352],
[0.52454816, 0.44345289, 0.22957721],
[0.53441391, 0.91396202, 0.45720481]]])]
The expected accuracy is 0.33333:
- two predictions should not be counted because they are padding tokens (
-1
), so you’re left with 3 ([0,2,2]) on rows 1, 2, 5.
- row 1 predicts index 1 (0.999) but the real label is 0, so wrong
- row 2 predicts index 2 (0.669) and the real label is 2, so correct
- row 5 predicts index 1 (0.913) but the real label is 2, so wrong
So the expected accuracy should be 1 / 3 = 0.3333. Your result is 1 for this case and in some other cases you even got 2 or 3 (which is impossible).
Note, to reproduce the numbers, so you can test your implementation:
np.random.seed(1)
# (y_true, y_pred)
(np.random.randint(-1, 3, size = (1,5)), np.random.rand(1,5,3))
Cheers