C5W4A1 Exercise 3 - scaled_dot_product_attention

farzaneh · May 25, 2021, 1:24pm

Could you please elaborate about what mask = np.array([0, 0, 1, 0]) means?
I have been able to pass the all tests for this exercise by hard coding the part related to
(# add the mask to the scaled tensor) to make it work. However, I am not able to understand what this mask exactly means, hence don’t know how to apply it to my code…

I have read the explanations about padding mask and look head mask. But in both cases there was no mask given as an input to the functions create_padding_mask or create_look_ahead_mask. we only give the information about the sequence that we would like to mask.

edwardyu · May 26, 2021, 2:21am

Hi,
Tensorflow provides a Transformer tutorial, here explains how to create/use masks in the Transformer tutouial.
However, tensorflow keras layer api MultiHeadAttention implementation is not consistent with tutorial. In MultiHeadAttention layer api, it says “1 indicates attention and 0 indicates no attention”, but tutorial (as well as our create_padding_mask, create_look_ahead_mask) is in opposite way. You can check here to see how to create masks for MultiHeadAttention layer api.

yoda_chen · July 6, 2021, 9:28am

Still confused. For scaled_attention_logits, how are the mask applied? Can someone point to exact formula used to update?

yoda_chen · July 6, 2021, 3:39pm

Figured it out.

The big hint was in the notebook… Multiply (1. - mask) by -1e9

Need to increment scaled_attention_logits by that amount.

jlecornu · July 12, 2021, 9:22pm

@yoda_chen - am still struggling with this, my error is below?

The hint in the notebook comes in the line:

scaled_attention_logits += (.....)

Is this correct?

Any hints: @Mubsi , really trying to pass this course before my subscription expires!

yoda_chen · July 12, 2021, 10:55pm

No, look at the lines following “Exercise 3”, starts with " Reminder : The boolean mask parameter…"

Topic		Replies	Views
Course5_week4 Size of attention_weights Sequence Models	10	730	June 19, 2021
[Week4] create_padding_mask: shape-confusion Sequence Models	2	868	May 12, 2021
C5 W4 A1 Ex-3 Questions (scaled_dot_product_attention) Sequence Models	6	596	October 18, 2022
C5_W4_A1 scaled_dot_product_attention mask issue Sequence Models	3	1011	November 15, 2022
Help needed for C5W4A1 EX-3 Sequence Models week-4	4	48	August 17, 2024

C5W4A1 Exercise 3 - scaled_dot_product_attention

Related topics