C5_W4_A1 Decoder_layer self.mha1

basically I am getting this error:
AssertionError: Wrong values in attn_w_b1. Check the call to self.mha1

for this line of code:
mult_attn_out1, attn_weights_block1 = self.mha1(x, x, x,look_ahead_mask, return_attention_scores=True)

This is basically as above in the encoder section.
Rather scratching my head here.


Did you modify any of the code in the constructor part of the DecoderLayer() class?

And, does your create_look_ahead_mask() function work correctly?
Please post the output for the unit test for that function.

You mean the init? No of course not.

def init(self, embedding_dim, num_heads, fully_connected_dim, dropout_rate=0.1, layernorm_eps=1e-6):
super(DecoderLayer, self).init()

    self.mha1 = MultiHeadAttention(num_heads=num_heads,

    self.mha2 = MultiHeadAttention(num_heads=num_heads,

    self.ffn = FullyConnected(embedding_dim=embedding_dim,

    self.layernorm1 = LayerNormalization(epsilon=layernorm_eps)
    self.layernorm2 = LayerNormalization(epsilon=layernorm_eps)
    self.layernorm3 = LayerNormalization(epsilon=layernorm_eps)

    self.dropout_ffn = Dropout(dropout_rate)

<tf.Tensor: shape=(1, 3, 3), dtype=float32, numpy=
array([[[0., 1., 1.],
[0., 0., 1.],
[0., 0., 0.]]], dtype=float32)>
The output doesn’t seem to be happy. Out[code line] is red.

Your mask value is not correct.
Please post the code you added to that function.

I would like to point to the sentence above the mask in the exercise:
“Just because you’ve worked so hard, we’ll also implement this mask for you :innocent::innocent:. Again, take a close look at the code so you can effictively implement it later.”

def create_look_ahead_mask(sequence_length):
Returns an upper triangular matrix filled with ones

    sequence_length -- matrix size

    mask -- (size, size) tensor
mask = 1-tf.linalg.band_part(tf.ones((1,sequence_length, sequence_length)), -1, 0)
return mask 

Don’t remember changing anything. I also, as I just noticed the different dimensions, ran it without the additional dimension in the tf.ones, which gives the same result. Both in the output below the function and later on.

I looked into the public_tests.py and it seems the values the tester is looking for do kinda exist, but are not in the position attn_w_b1[0, 0, 1] is looking for.
print(attn_w_b1[0, 0, 1])=tf.Tensor([0. 0. 1.], shape=(3,), dtype=float32)
print(attn_w_b1[0, 0, 0])=tf.Tensor([0. 0.49384946 0.50615054], shape=(3,), dtype=float32)
compare to of the tester [0.5271505, 0.47284946, 0.]

How should the look_ahead_mask look?

Do you see the "1 - " in line of code for “mask = …”?

That "1 - " should not be there. It isn’t there in the current version of that notebook.

Either you’re using an obsolete copy of the notebook, or you added it yourself.

1 Like

Ok, found it.You are right. Thank you!
In this case the code returns this:
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 0., 0.],
[1., 1., 0.],
[1., 1., 1.]], dtype=float32)>

Which is a lower triangular matrix and might be the reason I really did change the code. This is different from [Transformer model for language understanding  |  Text  |  TensorFlow]
God, this is silly.
Ok, on towards the end and hoping for the best, when I actually need to reproduce it somewhere else.
Thank you, TMosh!

This exercise has been updated several times to sort out this type of discrepancy.

I forced a new download yesterday and just did as well. The wrong comment is still there, as is the wrong dim of matrix, according also to the comments. Is the dim correct or not? Who knows?
This is misleading and cost me a lot of time.
The revision process isn’t thorough or the creators didn’t know their maths then. Doubling down is a very strange strategy.

I believe there is an update pending for this exercise. It hasn’t been published yet.