C5_W4_A1 Decoder_layer self.mha1

cassio · August 4, 2021, 11:26pm

Hello,
basically I am getting this error:
AssertionError: Wrong values in attn_w_b1. Check the call to self.mha1

for this line of code:
mult_attn_out1, attn_weights_block1 = self.mha1(x, x, x,look_ahead_mask, return_attention_scores=True)

This is basically as above in the encoder section.
Rather scratching my head here.

cassio

TMosh · August 5, 2021, 12:13am

Did you modify any of the code in the constructor part of the DecoderLayer() class?

TMosh · August 5, 2021, 12:14am

And, does your create_look_ahead_mask() function work correctly?
Please post the output for the unit test for that function.

cassio · August 5, 2021, 1:17am

You mean the init? No of course not.

def init(self, embedding_dim, num_heads, fully_connected_dim, dropout_rate=0.1, layernorm_eps=1e-6):
super(DecoderLayer, self).init()

    self.mha1 = MultiHeadAttention(num_heads=num_heads,
                                  key_dim=embedding_dim,
                                  dropout=dropout_rate)

    self.mha2 = MultiHeadAttention(num_heads=num_heads,
                                  key_dim=embedding_dim,
                                  dropout=dropout_rate)

    self.ffn = FullyConnected(embedding_dim=embedding_dim,
                              fully_connected_dim=fully_connected_dim)

    self.layernorm1 = LayerNormalization(epsilon=layernorm_eps)
    self.layernorm2 = LayerNormalization(epsilon=layernorm_eps)
    self.layernorm3 = LayerNormalization(epsilon=layernorm_eps)

    self.dropout_ffn = Dropout(dropout_rate)

cassio · August 5, 2021, 1:28am

<tf.Tensor: shape=(1, 3, 3), dtype=float32, numpy=
array([[[0., 1., 1.],
[0., 0., 1.],
[0., 0., 0.]]], dtype=float32)>
The output doesn’t seem to be happy. Out[code line] is red.

TMosh · August 5, 2021, 1:53am

Your mask value is not correct.
Please post the code you added to that function.

cassio · August 5, 2021, 12:45pm

I would like to point to the sentence above the mask in the exercise:
“Just because you’ve worked so hard, we’ll also implement this mask for you . Again, take a close look at the code so you can effictively implement it later.”

def create_look_ahead_mask(sequence_length):
“”"
Returns an upper triangular matrix filled with ones

Arguments:
    sequence_length -- matrix size

Returns:
    mask -- (size, size) tensor
"""
mask = 1-tf.linalg.band_part(tf.ones((1,sequence_length, sequence_length)), -1, 0)
return mask

Don’t remember changing anything. I also, as I just noticed the different dimensions, ran it without the additional dimension in the tf.ones, which gives the same result. Both in the output below the function and later on.

cassio · August 5, 2021, 2:19pm

I looked into the public_tests.py and it seems the values the tester is looking for do kinda exist, but are not in the position attn_w_b1[0, 0, 1] is looking for.
print(attn_w_b1[0, 0, 1])=tf.Tensor([0. 0. 1.], shape=(3,), dtype=float32)
while
print(attn_w_b1[0, 0, 0])=tf.Tensor([0. 0.49384946 0.50615054], shape=(3,), dtype=float32)
compare to of the tester [0.5271505, 0.47284946, 0.]

How should the look_ahead_mask look?

TMosh · August 5, 2021, 6:51pm

Do you see the "1 - " in line of code for “mask = …”?

That "1 - " should not be there. It isn’t there in the current version of that notebook.

Either you’re using an obsolete copy of the notebook, or you added it yourself.

cassio · August 5, 2021, 7:05pm

Ok, found it.You are right. Thank you!
In this case the code returns this:
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 0., 0.],
[1., 1., 0.],
[1., 1., 1.]], dtype=float32)>

Which is a lower triangular matrix and might be the reason I really did change the code. This is different from [Transformer model for language understanding | Text | TensorFlow]
God, this is silly.
Ok, on towards the end and hoping for the best, when I actually need to reproduce it somewhere else.
Thank you, TMosh!

TMosh · August 6, 2021, 12:11am

This exercise has been updated several times to sort out this type of discrepancy.

cassio · August 6, 2021, 10:15pm

Seriously?
I forced a new download yesterday and just did as well. The wrong comment is still there, as is the wrong dim of matrix, according also to the comments. Is the dim correct or not? Who knows?
This is misleading and cost me a lot of time.
The revision process isn’t thorough or the creators didn’t know their maths then. Doubling down is a very strange strategy.

TMosh · August 7, 2021, 12:28am

I believe there is an update pending for this exercise. It hasn’t been published yet.

Topic		Replies	Views
Course 5, week 4, transformer, UNQ_C6 Sequence Models coursera-platform	5	586	August 3, 2021
Wrong value error in UNQ_C6 Sequence Models week-module-4 , coursera-platform	6	151	May 22, 2024
Course 5, Week 4, Transformer Sequence Models coursera-platform	4	981	July 27, 2021
Decoder Layer "Wrong values in out" Sequence Models coursera-platform	3	471	August 23, 2023
Weak4-Assignment 1 Sequence Models coursera-platform	1	426	July 19, 2023

C5_W4_A1 Decoder_layer self.mha1

Related topics