I’m not sure what the hint is referring to here: ```
Multiply (1. - mask) by -1e9 before applying the softmax.
I don't recall having to use (1.-mask) anywhere - rather we add mask to scaled attention logits, following equation (4).
I’m not sure what the hint is referring to here: ```
Multiply (1. - mask) by -1e9 before applying the softmax.
I don't recall having to use (1.-mask) anywhere - rather we add mask to scaled attention logits, following equation (4).
This is covered in the assignment instructions in the notebook.
Except all it says is what I wrote above in question description. I’m not sure why you would need (1. - mask) * (-1e9) to use in this exercise.
It is because of how the mask is used in this assignment. The code converts mask values of 0 and 1 into values of “very large negative value” and zero.