Transformer Summarizer C4W2_Assignment Exercise 1 - scaled_dot_product_attention

John_Murphy1 · April 30, 2024, 5:23pm

Hi all.
I am getting the wrong output with Exercise 1 - scaled_dot_product_attention. Here is my code’s output:
Output:
[[[0.38 0. ]
[1. 0. ]
[1. 0. ]]]
Attention weigths:
[[[0.62 0. 0.38 0. 0. ]
[0. 0.5 0.5 0. 0. ]
[0. 0. 1. 0. 0. ]]]

And this doesn’t match the expected output. I have checked the output shape and it seems to be correct. It is (3,2) for the first test, and (1, 3,2) for the second test. (Per TMOSH in another comment.)
For the attention_weights, I am using the tf.nn.softmax instead of the tf.keras.activations.softmax. For the mask, I am multiplying it times -1e9 before adding to the scaled_attention_logits.

The rest of the “None” seemed straight forward. Any suggestions?

PS - Can someone fix the incorrect spelling of “weigths” in the ‘Test Your Function’?
Thanks!
John

paulinpaloalto · April 30, 2024, 6:11pm

Check your code for handling the mask. It’s not mask * -1e9, right? They wrote it out for you in the instructions.

John_Murphy1 · April 30, 2024, 6:18pm

Paul- Thanks I misread that!! I had (mask * -1e9)…
Thank you. (Slaps self for reading only part of an instruction line…)

paulinpaloalto · April 30, 2024, 9:42pm

No worries! I’m sure we’ve all “been there”. And every time we relearn the “meta” lesson that “saving time” by not reading instructions carefully is almost always not a net savings of time.

Topic		Replies	Views
C4W2 assignment exercise 1 scaled dot-product attention \| wrong output values NLP with Attention Models week-2	3	194	April 24, 2024
C4W2_Assignment in Natural Language Processing with Attention NLP with Attention Models week-3	2	55	September 2, 2024
C4W2 - Grading Error NLP with Attention Models	4	542	February 13, 2024
Course5_week4 Size of attention_weights Sequence Models	10	730	June 19, 2021
C4W2_Ex4-Transformer: wrong shape for transformer NLP with Attention Models week-2	5	209	May 6, 2024

Transformer Summarizer C4W2_Assignment Exercise 1 - scaled_dot_product_attention

Related topics