C5W4A1 Exercise 3 - scaled_dot_product_attention

Could you please elaborate about what mask = np.array([0, 0, 1, 0]) means?
I have been able to pass the all tests for this exercise by hard coding the part related to
(# add the mask to the scaled tensor) to make it work. However, I am not able to understand what this mask exactly means, hence don’t know how to apply it to my code…

I have read the explanations about padding mask and look head mask. But in both cases there was no mask given as an input to the functions create_padding_mask or create_look_ahead_mask. we only give the information about the sequence that we would like to mask.

Tensorflow provides a Transformer tutorial, here explains how to create/use masks in the Transformer tutouial.
However, tensorflow keras layer api MultiHeadAttention implementation is not consistent with tutorial. In MultiHeadAttention layer api, it says “1 indicates attention and 0 indicates no attention”, but tutorial (as well as our create_padding_mask, create_look_ahead_mask) is in opposite way. You can check here to see how to create masks for MultiHeadAttention layer api.


Still confused. For scaled_attention_logits, how are the mask applied? Can someone point to exact formula used to update?

Figured it out.

The big hint was in the notebook… Multiply (1. - mask) by -1e9

Need to increment scaled_attention_logits by that amount.

1 Like

@yoda_chen - am still struggling with this, my error is below?

The hint in the notebook comes in the line:

scaled_attention_logits += (.....)

Is this correct?

Any hints: @Mubsi , really trying to pass this course before my subscription expires!

No, look at the lines following “Exercise 3”, starts with " Reminder : The boolean mask parameter…"