C4W2 NLP with Attention Models Exercise 5 - next_word “softmax_404” error
I already posted this but Deepti suggested if I have “similar issue, kindly always create a new topic”. Here is the tag to the original post:
Even though my next_word gives the expected outputs, [[14859]] and masses, and it passes the tests and as a part of this:
w2_unittest.test_next_word(next_word, transformer, encoder_input, output)
" All tests passed!"
When I try to summarize a sentence using this line code:
summarize(transformer, document[training_set_example])
After the training set and human set come out correctly, I get this:
Model written summary:
which throws the following error:
InvalidArgumentError: Exception encountered when calling layer ‘softmax_404’ (type Softmax).
{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name:
Call arguments received by layer ‘softmax_404’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)
The summarize references this code from DecoderLayer.call:
mult_attn_out2, attn_weights_block2 = self.mha2(… …)
Suggestion from Deepti in the linked post was to"Create a look_ahead_mask for the output". My next_word does this by calling
create_look_ahead_mask
with: tf.shape(output)[1]
Is that incorrect?
Here are my shapes:
Q1: shape= (1, 2, 128)
enc_padding_mask: shape=(1, 1, 150)
dec_padding_mask: shape=(1, 1, 2)
output: shape=(1, 2)
look_ahead_mask: shape=(1, 2, 2)
padding_mask: shape=(1, 1, 2)
Output of transformer shape=(1, 7, 350)
Not sure if this will address your issue but it is important to note that the padding mask for the 2nd decoder layer (the x-attention layer and not the first self-attention layer) should be based on the encoder input padding mask (and not the decoder input padding mask).
Thanks. Do you mean I am not supposed to use:
padding_mask (tf.Tensor): Boolean mask for the second multihead attention layer
for the attention_mask when I call self.mha2?
I tried that just now, with the same results. Any other ideas?
Please DM your Codes for the error you are encountering. Click on my name, then message.
Your issue seems not to be related to the post link comment you have shared and this is the reason I tell learners to always create a new topic for their issue even if would have similar thread posts.
Your issue is with the way you have recalled the input1 and input2 and then probably related to issue @Cawnpore_Charlie had, I can see based on your shape detail shared
your enc_padding mask and dec padding mask shape are not similar, this is could be also one of the issue.
with the post you created you error don’t match with @John_Murphy1 error output. So kindly stick to your post and don’t get yourself confused. I have responded to your post asking for more clarification on which exercise caused you the error.
If you see here, John has created a perfect topic with a proper header and right selection of categories that mentions which Exercise had caused error, for letting the mentor to know where to check upon.
So please update the response on your post thread.
FOR THE BELOW CODE LINE
Create a look-ahead mask for the output
YOU HAVE USED INCORRECT OUTPUT, YOU NEED TO USE ARGUMENT CALL ALREADY ASSIGNMENT FOR OUTPUT i.e.
output (tf.Tensor): (incomplete) target (summary), so inclusion of tf.tensor or shape is not required, just use output
Next for Create a padding mask for the input (decoder), you again used incorrect input,
remember the argument call already assigned for input is
encoder_input (tf.Tensor): Input data to summarize
So the input for encoder mask as well decoder mask will be same.
Thank you for your patience.
According to your latest error output shared in DM i.e.
the mismatch between the inputs have happen because in the grader cell transformer,
YOU HAVE ADDED AN EXTRA CODE LINE AS
creating padding mask for output sentence, and then used this dec_padding_pask to create the dec_output.
In my updated lab assignment I do not have the above code line (i.e ADDED 5-2 create the dec_padding mask), kindly remove that code line.
Let me know after this correction, if you get any new error.
GRADED FUNCTION: scaled_dot_product_attention
1.softmax is normalized on the last axis (seq_len_k) so that the scores add up to 1.
YOU DONT REQUIRE AXIS FOR THIS CODE RECALL
GRADED FUNCTION: DecoderLayer
For Block 1, it is clearly mentioned Dropout will be applied during training, so you do not require training=training for your self-attention mult_attn_out1. Same issue with mult_attn_out2
GRADED FUNCTION: Decoder
No mistakes
GRADED FUNCTION: Transformer
No mistakes
GRADED FUNCTION: next_word
For the code line you were suppose to use create_padding_mask but you have used create_look_ahead_mask
Create a look-ahead mask for the output THIS IS THE MAIN REASON BEHIND YOUR ERROR
Hi - to anyone who runs into this “softmax_404” error problem when running next_word, here are some of the sources of my error:
scaled_dot_product_attention: the softmax is normalized on the last axis, so the scores add up to one. Becuase of this, you do not need an axis.
In the DecoderLayer, I missed the instructions that mention that Dropout will be applied and because of this, I had a training=training which was not needed for mult_attn_out 1 and 2.
In next_word, the instruction say to use a create_padding_mask but I used a look_ahead_mask.
Hope this helps. Good luck.