C5W4 Ex3 scaled_dot_product_attention - wrong masked weights

Aarjav · March 6, 2022, 1:42am

Hi,

I am getting quite confused with the the implementation of Ex3. In the implementation of this method, the “wrong masked weights” assertion error is thrown:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-40-00665b20febb> in <module>
      1 # UNIT TEST
----> 2 scaled_dot_product_attention_test(scaled_dot_product_attention)

~/work/W4A1/public_tests.py in scaled_dot_product_attention_test(target)
     73     assert np.allclose(weights, [[0.30719590187072754, 0.5064803957939148, 0.0, 0.18632373213768005],
     74                                  [0.3836517333984375, 0.3836517333984375, 0.0, 0.2326965481042862],
---> 75                                  [0.3836517333984375, 0.3836517333984375, 0.0, 0.2326965481042862]]), "Wrong masked weights"
     76     assert np.allclose(attention, [[0.6928040981292725, 0.18632373213768005],
     77                                    [0.6163482666015625, 0.2326965481042862],

AssertionError: Wrong masked weights

I am not quite sure what is causing this. To avoid posting code, here is a summary of my implementation:

Multiplied the query and transpose of key matrices through matmul
Initialised dk with the depth dimension of k
Implemented the mask
Used tensor flow softmax method on the scaled attention logits

This leads me to believe that there may really be something wrong with the method implementation in exercise 1 get_angles.

Any help is appreciated!

TMosh · March 6, 2022, 2:17am

Errors in get_angles() are unrelated to scaled_dot_product_attention().

Your summary is missing a few steps of computation.

Aarjav · March 6, 2022, 2:27am

Yep! Just solved the get_angles error (silly mistake) and have edited it out of the question.

Ex3 summary with the missing steps in bold:

Multiplied the query and transpose of key matrices through matmul
Initialised dk with the depth dimension of k
scaled_attention_logits set to the division of matmul_qk and the square root of dk
Implemented the mask
Used the tensor flow softmax method on the last (sq_len_k) axis of the scaled_attention_logits

Aarjav · March 6, 2022, 7:40am

Managed to pass all other test cases for the entire assignment. However, due to this single assertion error, the grader returns a grade of 0/100.

Any help with this is greatly appreciated.

TMosh · March 6, 2022, 6:22pm

Are you using numpy functions, or TensorFlow functions?

Aarjav · March 7, 2022, 7:38am

Thanks for the reply
I am using Tensorflow functions:

tf.linalg.matmul
tf.cast(), tf.shape()
tf.math.sqrt()
tf.nn.softmax() – also tried tf.keras.layers.Softmax(axis)(logits) with same result

I have also tried restarting the kernel just in case.
My method is basically the same as that presented in the TensorFlow tutorial for transformer model

The output attention weights from my method implementation is:

attentiont_weights:
[[0.2589478  0.42693272 0.15705977 0.15705977]
 [0.2772748  0.2772748  0.2772748  0.16817567]
 [0.33620113 0.33620113 0.12368149 0.2039163 ]]

The output expected by the test case is:

[[0.30719590187072754, 0.5064803957939148, 0.0, 0.18632373213768005],
 [0.3836517333984375, 0.3836517333984375, 0.0, 0.2326965481042862],
 [0.3836517333984375, 0.3836517333984375, 0.0, 0.2326965481042862]]

TMosh · March 7, 2022, 6:08pm

Try tf.keras.activations.softmax(), and the argument is the scaled attention logits.
I think (axis)(logits) is the wrong syntax.

Aarjav · March 8, 2022, 8:17am

The tf.keras.activations.softmax() also resulted in the same output.

Interestingly, I didn’t get a syntax error with tf.keras.layers.Softmax(axis)(logits), but nevertheless tried tf.keras.layers.Softmax()(logits) (since axis = -1 is default anyways).

Having no idea what to try anymore, I tried copy-pasting the code parts of the TensorFlow Tutorial, but the output from the method is still exactly the same.

TMosh · March 8, 2022, 8:30am

When you’re using the functional API, the data parameter goes inside the parenthesis. Not outside as a separate argument.

Aarjav · March 8, 2022, 9:04am

Yes, that is what I have done for the method you have suggested. What I have at currently:
attention_weights = tf.keras.activations.softmax(scaled_attention_logits)
This still results in the assertion error due to the output still being

Please note: providing the inputs outside is shown in the documentation for tf.keras.layers.Softmax():

layer = tf.keras.layers.Softmax()
layer(inp)

TMosh · March 8, 2022, 6:57pm

Check if you are using (1-mask).

Topic		Replies	Views
C5_W4A1 scaled_dot_product_attention wrong masked values Sequence Models week-4	3	33	September 19, 2024
C5W4: AssertionError: Wrong masked weights Sequence Models	3	1117	September 9, 2021
C5_W4_A1_Transformer_Subclass_v1 W4 UNQ C3 Sequence Models	2	613	October 17, 2021
Week 4 [C5_W4_A1_Transformer_Subclass_v1] Sequence Models	1	527	January 22, 2022
Scaled_dot_product_attention C5W4 Sequence Models	2	651	September 29, 2021

C5W4 Ex3 scaled_dot_product_attention - wrong masked weights

Related topics