C4W2 Assignment NLP Transformer Summariser Error

Abiton_Padera · January 20, 2024, 10:16am

InvalidArgumentError: Exception encountered when calling layer ‘softmax_296’ (type Softmax).

Can not understand why my model summary is giving error when all above tests are passed. It is pointing error to my NextWord function but it produced the expected output in all exercises. Kindly assist where l should look at l have been trying but l think am blurring now.

Thank you for your time.

TMosh · January 20, 2024, 5:11pm

Generally these model summary errors are due to the unit test looking for a specific set of text for the layer names.

Sometimes there is more than one way to write code that works correctly, but the unit test has only one method it is expecting you to use.

Abiton_Padera · January 21, 2024, 8:37am

Thank you buddy for the beneficial response. So in this case what do you suggest l do, I have been going though some functions and seen to hit brick wall, but am not stopping though, am going through from start and check as well.

If you have any further advice please do so. Thank you in advance

TMosh · January 21, 2024, 8:39am

I’m not a mentor for this course, so I don’t have any other thoughts on the issue.

Hopefully a mentor for this course will reply here.

arvyzukai · January 22, 2024, 6:55am

Hi @Abiton_Padera

There are multiple probable causes for this error.

The first thing to check is the documentation for softmax. Note, that this function takes in just two arguments - input and the axis. In most cases, the axis is the default one - last one. So, probably the most often used case is just tf.keras.activations.softmax(inputs) in other words, it receives just the inputs.

The most probable place for this error should be in Exercise 1 - have your correctly computed scaled_attention_logits (which is the input for the softmax). Also, Exercise 4 the final layer uses softmax activation (which is already defined for you in Dense layer initialization and you don’t have call it here).

Let me know if any of these help.
Cheers

Abiton_Padera · January 23, 2024, 3:36am

Thank you for the guide, l have also checked my softmax input function according to documentation. Would you mind if l share the code and may have a look if you have time. It is also showing there is error on my next_word function but all expected outcomes are met, and tests passed.

Cawnpore_Charlie · April 20, 2024, 2:56am

In C4W2:
w2_unittest.test_next_word(next_word, transformer, encoder_input, output)

I’m getting “All tests passed!”

but on executing the subsequent cell ending in:
summarize(transformer, document[training_set_example])

I get an error message ending in:

mult_attn_out2, attn_weights_block2 = self.mha2(query=Q1, value=enc_output, key=enc_output, attention_mask=padding_mask, training=training,
return_attention_scores=True)

InvalidArgumentError: Exception encountered when calling layer ‘softmax_58’ (type Softmax).

{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name:

Call arguments received by layer ‘softmax_58’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)

I would really appreciate any help!. Thank you.

ALPIER_BCHARA · April 20, 2024, 5:13pm

I tried all of this suggestion but still got the same error what can I do if you can provide more details please

Deepti_Prasad · April 20, 2024, 6:37pm

I.hope you are using tf.nn.softmax.(INCORRECT SUGGESTION)

The above suggestion was not related to the issue learner had as instructions clearly mentioned

You can use tf.keras.activations.softmax for softmax.

Cawnpore_Charlie · April 21, 2024, 12:36am

I was not and I changed it to use tf.nn.softmax(scaled_attention_logits, axis=-1)

but unfortunately, I am still getting the same error message

InvalidArgumentError: Exception encountered when calling layer ‘softmax_58’ (type Softmax).

{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name:

Call arguments received by layer ‘softmax_58’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)

Please help. Would it be possible for you to take a look at my notebook and see where I am going wrong?

Thank you very much.

Deepti_Prasad · April 21, 2024, 2:17am

Ok share the code for the particular cell you encountered the error. Send through personal DM. Click on my name and then message.

Cawnpore_Charlie · April 21, 2024, 2:37am

I just messaged you with the code. Thank you.

Deepti_Prasad · April 21, 2024, 4:04am

Hello @Cawnpore_Charlie

Although I feel there might be issue with other grader codes too but lets go step by step with the grader cell you have shared in DM

In the below code line, it is clearly mentioned you are only suppose to create mask for the output and there was no separate instruction given on shape related dimensions, so you using the output.shape[1] is creating the first error
Create a look-ahead mask for the output
Next in the below code line, notice it tells to create mask for input for decoder, so here using again output is an incorrect code of choice.
Create a padding mask for the input (decoder)
HINT FOR THIS WOULD BE AGAIN FROM A CELL BEFORE THIS EXERCISE WHICH MENTIONS
dec_padding_mask = create_padding_mask(inp) # Notice that both encoder and decoder padding masks are equal, so notice the input would be same for encoder and decoder padding masks.

Do these corrections, and let me know if still have any new error.

Regards
DP

Cawnpore_Charlie · April 21, 2024, 5:44am

Thank you very much for your prompt and detailed help. I appreciate it greatly.

Deepti_Prasad · April 21, 2024, 5:44am

please don’t share codes here. your. corrections are incorrect. read both correction I mentioned carefully. there are two errors in your code.

John_Murphy1 · May 2, 2024, 5:56pm

Deepti- Hi. About using “tf.nn.softmax”, are you referring to the class Transformer for the final layer? The following line of code:
self.final_layer = tf.keras.layers.Dense(target_vocab_size, activation=‘softmax’)
is before the " ### START CODE HERE ###"

Or are you referring to using ‘tf.nn.softmax’ in the scaled_dot_product_attention?
The ‘Additional Hints’ states:

You can use tf.keras.activations.softmax for softmax.

Is the hint for scaled_dot_product_attention wrong?
Thanks!
John

Deepti_Prasad · May 2, 2024, 6:10pm

Hi @John_Murphy1

Sorry that issue suggestion was not correct, and I completely forgot to do the edit. His issue was related to he was using incorrect recall for the mask steps in relation to encoder decoder.

Also if you have a similar issue, kindly always create a new topic and you can always tag a comment which you want use reference from other post in your post for explaining where all you referred and still are unsure of how to go about your error.

No the hint is perfectly right.

Regards
DP

John_Murphy1 · May 2, 2024, 8:49pm

Thanks!

vijayneuralnet · July 9, 2024, 7:14pm

Hi,

How did you resolve this issue? I am also facing the same issue

Cawnpore_Charlie · July 9, 2024, 8:09pm

Next in the below code line, notice it tells to create mask for input for decoder, so here using again output is an incorrect code of choice.
Create a padding mask for the input (decoder)
HINT FOR THIS WOULD BE AGAIN FROM A CELL BEFORE THIS EXERCISE WHICH MENTIONS
dec_padding_mask = create_padding_mask(inp) # Notice that both encoder and decoder padding masks are equal, so notice the input would be same for encoder and decoder padding masks.

It’s been a while - hopefully, this guidance from Deepti resolves your issue.

Topic		Replies	Views
NLP with Attention Models C4W2 - Exercise 5 - next_word “softmax_404” error NLP with Attention Models week-module-2	12	363	May 6, 2024
C4W2 - InvalidArgumentError: Exception encountered when calling layer 'softmax_157' NLP with Attention Models week-module-2	2	35	September 17, 2024
C4W2_Assignment 2: Transformer Summarizer InvalidArgumentError NLP with Attention Models week-module-3	4	31	August 25, 2024
C4W2 - Grading Error NLP with Attention Models	4	547	February 13, 2024
C4W2 cannot graded NLP with Attention Models week-module-2	14	907	October 29, 2024

C4W2 Assignment NLP Transformer Summariser Error

Related topics