C4W2 Assignment NLP Transformer Summariser Error

InvalidArgumentError: Exception encountered when calling layer ‘softmax_296’ (type Softmax).

Can not understand why my model summary is giving error when all above tests are passed. It is pointing error to my NextWord function but it produced the expected output in all exercises. Kindly assist where l should look at l have been trying but l think am blurring now.

Thank you for your time.

Generally these model summary errors are due to the unit test looking for a specific set of text for the layer names.

Sometimes there is more than one way to write code that works correctly, but the unit test has only one method it is expecting you to use.

Thank you buddy for the beneficial response. So in this case what do you suggest l do, I have been going though some functions and seen to hit brick wall, but am not stopping though, am going through from start and check as well.

If you have any further advice please do so. Thank you in advance

I’m not a mentor for this course, so I don’t have any other thoughts on the issue.

Hopefully a mentor for this course will reply here.

1 Like

Hi @Abiton_Padera

There are multiple probable causes for this error.

The first thing to check is the documentation for softmax. Note, that this function takes in just two arguments - input and the axis. In most cases, the axis is the default one - last one. So, probably the most often used case is just tf.keras.activations.softmax(inputs) in other words, it receives just the inputs.

The most probable place for this error should be in Exercise 1 - have your correctly computed scaled_attention_logits (which is the input for the softmax). Also, Exercise 4 the final layer uses softmax activation (which is already defined for you in Dense layer initialization and you don’t have call it here).

Let me know if any of these help.
Cheers

Thank you for the guide, l have also checked my softmax input function according to documentation. Would you mind if l share the code and may have a look if you have time. It is also showing there is error on my next_word function but all expected outcomes are met, and tests passed.

In C4W2:
w2_unittest.test_next_word(next_word, transformer, encoder_input, output)

I’m getting “All tests passed!”

but on executing the subsequent cell ending in:
summarize(transformer, document[training_set_example])

I get an error message ending in:

mult_attn_out2, attn_weights_block2 = self.mha2(query=Q1, value=enc_output, key=enc_output, attention_mask=padding_mask, training=training,
return_attention_scores=True)

InvalidArgumentError: Exception encountered when calling layer ‘softmax_58’ (type Softmax).

{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name:

Call arguments received by layer ‘softmax_58’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)

I would really appreciate any help!. Thank you.

1 Like

I tried all of this suggestion but still got the same error what can I do if you can provide more details please

I.hope you are using tf.nn.softmax.(INCORRECT SUGGESTION)

The above suggestion was not related to the issue learner had as instructions clearly mentioned

1 Like

I was not and I changed it to use tf.nn.softmax(scaled_attention_logits, axis=-1)

but unfortunately, I am still getting the same error message

InvalidArgumentError: Exception encountered when calling layer ‘softmax_58’ (type Softmax).

{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name:

Call arguments received by layer ‘softmax_58’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)

Please help. Would it be possible for you to take a look at my notebook and see where I am going wrong?

Thank you very much.

Ok share the code for the particular cell you encountered the error. Send through personal DM. Click on my name and then message.

1 Like

I just messaged you with the code. Thank you.

Hello @Cawnpore_Charlie

Although I feel there might be issue with other grader codes too but lets go step by step with the grader cell you have shared in DM

  1. In the below code line, it is clearly mentioned you are only suppose to create mask for the output and there was no separate instruction given on shape related dimensions, so you using the output.shape[1] is creating the first error
    Create a look-ahead mask for the output

  2. Next in the below code line, notice it tells to create mask for input for decoder, so here using again output is an incorrect code of choice.
    Create a padding mask for the input (decoder)
    HINT FOR THIS WOULD BE AGAIN FROM A CELL BEFORE THIS EXERCISE WHICH MENTIONS
    dec_padding_mask = create_padding_mask(inp) # Notice that both encoder and decoder padding masks are equal, so notice the input would be same for encoder and decoder padding masks.

Do these corrections, and let me know if still have any new error.

Regards
DP

5 Likes

Thank you very much for your prompt and detailed help. I appreciate it greatly.

1 Like

please don’t share codes here. your. corrections are incorrect. read both correction I mentioned carefully. there are two errors in your code.

1 Like

Deepti- Hi. About using “tf.nn.softmax”, are you referring to the class Transformer for the final layer? The following line of code:
self.final_layer = tf.keras.layers.Dense(target_vocab_size, activation=‘softmax’)
is before the " ### START CODE HERE ###"

Or are you referring to using ‘tf.nn.softmax’ in the scaled_dot_product_attention?
The ‘Additional Hints’ states:

Is the hint for scaled_dot_product_attention wrong?
Thanks!
John

Hi @John_Murphy1

Sorry that issue suggestion was not correct, and I completely forgot to do the edit. His issue was related to he was using incorrect recall for the mask steps in relation to encoder decoder.

Also if you have a similar issue, kindly always create a new topic and you can always tag a comment which you want use reference from other post in your post for explaining where all you referred and still are unsure of how to go about your error.

No the hint is perfectly right.

Regards
DP

1 Like

Thanks!

Hi,

How did you resolve this issue? I am also facing the same issue

  1. Next in the below code line, notice it tells to create mask for input for decoder, so here using again output is an incorrect code of choice.
    Create a padding mask for the input (decoder)
    HINT FOR THIS WOULD BE AGAIN FROM A CELL BEFORE THIS EXERCISE WHICH MENTIONS
    dec_padding_mask = create_padding_mask(inp) # Notice that both encoder and decoder padding masks are equal, so notice the input would be same for encoder and decoder padding masks.

It’s been a while - hopefully, this guidance from Deepti resolves your issue.

1 Like