C4W2 cannot graded

I have problem with submittion although i follow the hint
There was a problem compiling the code from your notebook. Details:
Exception encountered when calling layer ‘softmax_3’ (type Softmax).

{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:CPU:0}} Incompatible shapes: [1,2,2,150] vs. [1,1,1,2] [Op:AddV2] name:

Call arguments received by layer ‘softmax_3’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)

1 Like

Do you know the posting rules, you are not supposed to publish code solutions!

At attention_weights try without the axis parameter!

2 Likes

Thank you and sorry. I read a lot of posts having the same problem, but nobody did not know what is error without code
Cheers

1 Like

Yes next time ask them to send it in private!

But i still not works :frowning:

1 Like

I have fixed like you suggest but it still not works

1 Like

Send me the entire notebook in private let me have a look in it…

Had anybody done this assignment :frowning: I didn’t know how to fix

1 Like

Try maybe reseting the notebook and redo the entire assignment, sometimes the problems are found going through it again. But keep your current solutions so you can reuse them!

1 Like

For future learners - the OP’s mistake was in defining the dec_padding_mask.

Note, that when creating padding mask for the decoder’s second attention block - we use the encoder_input. In other words, we inform the decoder to not pay attention to padding tokens of the document to be summarized.

Also note, that this is different from look_ahead_mask (causal mask) where decoder is only allowed to pay attention to itself and its previous tokens.

Cheers

3 Likes

Thank you so much I was getting the same error solved

1 Like

re: " when creating padding mask for the decoder’s second attention block - we use the encoder_input"

Then in the Transformer class definition in the notebook is the following comment incorrect?

call(self, input_sentence, output_sentence, training, enc_padding_mask, look_ahead_mask, dec_padding_mask):
“”"
Forward pass for the entire Transformer
Arguments:
input_sentence (tf.Tensor): Tensor of shape …

       dec_padding_mask (tf.Tensor): Boolean mask for the second multihead attention layer

The comment says that we should use dec_padding_mask for the second mha layer but in reality we should be using the enc_padding_mask - am I understanding you correctly?

Thank you.

I don’t know if you are asking about the same chunk of code, but if so I don’t think your interpretation is correct. I believe that the previous hint that Arvydas is giving there is referring to the next_word function. In that logic you create the dec_padding_mask by calling the create_padding_mask function. And the question is what argument you pass to that function in order to achieve that correctly. At least that’s my reading of it.

1 Like

Thank you for the prompt reply.

I think I finally get it. In the Transformer Class definition stage it is referred to as the dec_padding_mask for generality as it is intended for use in the 2nd attention layer in the decoder but in the model call in the next_word function this parameter needs to be based on the encoder input.

I hope I am getting it right.

Thank you.

2 Likes