NLP with Attention Models C4W2 - Exercise 5 - next_word “softmax_404” error

C4W2 NLP with Attention Models Exercise 5 - next_word “softmax_404” error
I already posted this but Deepti suggested if I have “similar issue, kindly always create a new topic”. Here is the tag to the original post:

Even though my next_word gives the expected outputs, [[14859]] and masses, and it passes the tests and as a part of this:
w2_unittest.test_next_word(next_word, transformer, encoder_input, output)
" All tests passed!"
When I try to summarize a sentence using this line code:
summarize(transformer, document[training_set_example])
After the training set and human set come out correctly, I get this:
Model written summary:

which throws the following error:
InvalidArgumentError: Exception encountered when calling layer ‘softmax_404’ (type Softmax).
{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name:
Call arguments received by layer ‘softmax_404’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)

The summarize references this code from DecoderLayer.call:
mult_attn_out2, attn_weights_block2 = self.mha2(… …)

Suggestion from Deepti in the linked post was to"Create a look_ahead_mask for the output". My next_word does this by calling
create_look_ahead_mask
with: tf.shape(output)[1]
Is that incorrect?

Here are my shapes:
Q1: shape= (1, 2, 128)
enc_padding_mask: shape=(1, 1, 150)
dec_padding_mask: shape=(1, 1, 2)
output: shape=(1, 2)
look_ahead_mask: shape=(1, 2, 2)
padding_mask: shape=(1, 1, 2)
Output of transformer shape=(1, 7, 350)

Any suggestions? Thank you.
John

Same issue for me too. Please someone suggest us what to do

Not sure if this will address your issue but it is important to note that the padding mask for the 2nd decoder layer (the x-attention layer and not the first self-attention layer) should be based on the encoder input padding mask (and not the decoder input padding mask).

Thanks. Do you mean I am not supposed to use:
padding_mask (tf.Tensor): Boolean mask for the second multihead attention layer
for the attention_mask when I call self.mha2?

I tried that just now, with the same results. Any other ideas?

Thank you.
John

Hi @John_Murphy1

Thank you for following community guidelines :slight_smile:

Please DM your Codes for the error you are encountering. Click on my name, then message.

Your issue seems not to be related to the post link comment you have shared and this is the reason I tell learners to always create a new topic for their issue even if would have similar thread posts.

Your issue is with the way you have recalled the input1 and input2 and then probably related to issue @Cawnpore_Charlie had, I can see based on your shape detail shared
your enc_padding mask and dec padding mask shape are not similar, this is could be also one of the issue.

Regards
DP

1 Like

Hello @Siva_Kumar1

with the post you created you error don’t match with @John_Murphy1 error output. So kindly stick to your post and don’t get yourself confused. I have responded to your post asking for more clarification on which exercise caused you the error.

If you see here, John has created a perfect topic with a proper header and right selection of categories that mentions which Exercise had caused error, for letting the mentor to know where to check upon.

So please update the response on your post thread.

Regards
DP

Hi @John_Murphy1

  1. FOR THE BELOW CODE LINE
    Create a look-ahead mask for the output
    YOU HAVE USED INCORRECT OUTPUT, YOU NEED TO USE ARGUMENT CALL ALREADY ASSIGNMENT FOR OUTPUT i.e.
    output (tf.Tensor): (incomplete) target (summary), so inclusion of tf.tensor or shape is not required, just use output

  2. Next for Create a padding mask for the input (decoder), you again used incorrect input,
    remember the argument call already assigned for input is
    encoder_input (tf.Tensor): Input data to summarize
    So the input for encoder mask as well decoder mask will be same.

Regards
DP

Hi @John_Murphy1

Thank you for your patience.
According to your latest error output shared in DM i.e.

the mismatch between the inputs have happen because in the grader cell transformer,
YOU HAVE ADDED AN EXTRA CODE LINE AS
creating padding mask for output sentence, and then used this dec_padding_pask to create the dec_output.

In my updated lab assignment I do not have the above code line (i.e ADDED 5-2 create the dec_padding mask), kindly remove that code line.

Let me know after this correction, if you get any new error.

Regards
DP

Hi

Thanks. Here is the error after that change and running the “Check if your function works.” under the next_word code block:

InvalidArgumentError: {{function_node _wrapped__Pack_N_3_device/job:localhost/replica:0/task:0/device:GPU:0}} Shapes of all inputs must match: values[0].shape = != values[1].shape = [1,1] [Op:Pack] name:

image.png

image.png

I included the error and the new state of transformer. Let me know what you suggest.

Regards,
John
m: 914-441-7799

This seems like issue with other grade cell as suspected. Let me check your assignment notebook.

Regards
DP

Hi @John_Murphy1

Issues with your assignment as per grade cell

GRADED FUNCTION: scaled_dot_product_attention
1.softmax is normalized on the last axis (seq_len_k) so that the scores add up to 1.
YOU DONT REQUIRE AXIS FOR THIS CODE RECALL

GRADED FUNCTION: DecoderLayer
For Block 1, it is clearly mentioned Dropout will be applied during training, so you do not require training=training for your self-attention mult_attn_out1. Same issue with mult_attn_out2

GRADED FUNCTION: Decoder
No mistakes

GRADED FUNCTION: Transformer
No mistakes

GRADED FUNCTION: next_word
For the code line you were suppose to use create_padding_mask but you have used create_look_ahead_mask
Create a look-ahead mask for the output THIS IS THE MAIN REASON BEHIND YOUR ERROR

Regards
DP

Thank you. I appreciate your time and effort on this. Your comments helped me learn from my mistakes, not just correct them.

Thank you again.

1 Like

Hi - to anyone who runs into this “softmax_404” error problem when running next_word, here are some of the sources of my error:

  1. scaled_dot_product_attention: the softmax is normalized on the last axis, so the scores add up to one. Becuase of this, you do not need an axis.
  2. In the DecoderLayer, I missed the instructions that mention that Dropout will be applied and because of this, I had a training=training which was not needed for mult_attn_out 1 and 2.
  3. In next_word, the instruction say to use a create_padding_mask but I used a look_ahead_mask.
    Hope this helps. Good luck.
1 Like