Transformer Summarizer C4W2_Assignment Exercise 1 - scaled_dot_product_attention

Hi all.
I am getting the wrong output with Exercise 1 - scaled_dot_product_attention. Here is my code’s output:
Output:
[[[0.38 0. ]
[1. 0. ]
[1. 0. ]]]
Attention weigths:
[[[0.62 0. 0.38 0. 0. ]
[0. 0.5 0.5 0. 0. ]
[0. 0. 1. 0. 0. ]]]

And this doesn’t match the expected output. I have checked the output shape and it seems to be correct. It is (3,2) for the first test, and (1, 3,2) for the second test. (Per TMOSH in another comment.)
For the attention_weights, I am using the tf.nn.softmax instead of the tf.keras.activations.softmax. For the mask, I am multiplying it times -1e9 before adding to the scaled_attention_logits.

The rest of the “None” seemed straight forward. Any suggestions?

PS - Can someone fix the incorrect spelling of “weigths” in the ‘Test Your Function’?
Thanks!
John

Check your code for handling the mask. It’s not mask * -1e9, right? They wrote it out for you in the instructions.

Paul- Thanks I misread that!! I had (mask * -1e9)…
Thank you. (Slaps self for reading only part of an instruction line…)

1 Like

No worries! I’m sure we’ve all “been there”. And every time we relearn the “meta” lesson that “saving time” by not reading instructions carefully is almost always not a net savings of time. :laughing: