Problem with Natural Language Processing with Attention Models

I having problem in graded assignment submission. It passes all the tests but when I submit it shows

There was a problem compiling the code from your notebook. Details:
Exception encountered when calling layer 'softmax_3' (type Softmax).

{{function_node __wrapped__AddV2_device_/job:localhost/replica:0/task:0/device:CPU:0}} Incompatible shapes: [1,2,2,150] vs. [1,1,1,2] [Op:AddV2] name: 

Call arguments received by layer 'softmax_3' (type Softmax):
  • inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
  • mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)
```

No clue , Please help

hi @Debottam_Bakshi_Gupt

you error probably means you to check how your are passing your masks.

is this the complete error log?

Hi @Deepti_Prasad ,

Yes, This is the full log and my code passed all tests so I have no clue which point it is failing

please check your inputs and how you have recalled your masks, most probably check your decorder, padding mask code recalls.

Hi @Deepti_Prasad ,

The submission error I get starts from very 1st exercise
```
GRADED FUNCTION: scaled_dot_product_attention

def scaled_dot_product_attention(q, k, v, mask):
```
So not only decoder. And create padding is there by default
```
def create_padding_mask(decoder_token_ids):
```
So again no clue; please help.

Thanks,

Debottam

When any of the functions under test throw an exception in the grader, then you get 0 for all sections because the grader cannot complete execution.

This syndrome where you pass the tests in the notebook, but then fail the grader, is a common one. It means that the code you have written for at least one of the functions is not general. In some way it is “hard-coded” to match the one set of test data in the notebook. Ways that this could happen would be referencing global variables from the body of your function instead of the formal parameters or hard-coding dimensions somewhere.

Start this analysis with the function cell that includes the softmax_3 layer. If that clue is not sufficient, then there are other ways we can help, but please apply the above suggestions and let us know what you find.

hi @Debottam_Bakshi_Gupt

Try pointers which @paulinpaloalto has mentioned in his comments, otherwise refer to the below post, which includes where and what might have gone wrong, it is a similar error issue encountered previously by another learner. See if you have done any similar mistakes.

Solution to C4W2, softmax error

if this doesn’t resolve issue, let us know.

regards
DP

1 Like

Hi @Deepti_Prasad , @paulinpaloalto ,

Its not same problem, I can give another stacktrace of the ungraded part of the same exercise
```
Training set example:
[SOS] amanda: i baked cookies. do you want some? jerry: sure! amanda: i’ll bring you tomorrow :slight_smile: [EOS]

Human written summary:
[SOS] amanda baked cookies and will bring jerry some tomorrow. [EOS]

Model written summary:

2026-01-01 05:50:44.849106: W tensorflow/core/framework/op_kernel.cc:1816] INVALID_ARGUMENT: required broadcastable shapes

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
Cell In[33], line 9
      7 print(summary[training_set_example])
      8 print('\nModel written summary:')
----> 9 summarize(transformer, document[training_set_example])

Cell In[32], line 16, in summarize(model, input_document)
     13 output = tf.expand_dims([tokenizer.word_index["[SOS]"]], 0)
     15 for i in range(decoder_maxlen):
---> 16     predicted_id = next_word(model, encoder_input, output)
     17     output = tf.concat([output, predicted_id], axis=-1)
     19     if predicted_id == tokenizer.word_index["[EOS]"]:

Cell In[29], line 20, in next_word(model, encoder_input, output)
     17 dec_padding_mask = create_padding_mask(output)
     19 # Run the prediction of the next word with the transformer model
---> 20 predictions, attention_weights = model(
     21     encoder_input,
     22     output,
     23     True,
     24     enc_padding_mask,
     25     look_ahead_mask,
     26     dec_padding_mask
     27 )
     28 ### END CODE HERE ###
     30 predictions = predictions[: ,-1:, :]

File /usr/local/lib/python3.8/dist-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

Cell In[21], line 57, in Transformer.call(self, input_sentence, output_sentence, training, enc_padding_mask, look_ahead_mask, dec_padding_mask)
     53 enc_output = self.encoder(input_sentence, training, enc_padding_mask)
     55 # call self.decoder with the appropriate arguments to get the decoder output
     56 # dec_output.shape == (batch_size, tar_seq_len, fully_connected_dim)
---> 57 dec_output, attention_weights = self.decoder(output_sentence, enc_output, training, look_ahead_mask, dec_padding_mask)
     59 # pass decoder output through a linear layer and softmax (~1 line)
     60 final_output = self.final_layer(dec_output)

Cell In[18], line 66, in Decoder.call(self, x, enc_output, training, look_ahead_mask, padding_mask)
     62 # use a for loop to pass x through a stack of decoder layers and update attention_weights (~4 lines total)
     63 for i in range(self.num_layers):
     64     # pass x and the encoder output through a stack of decoder layers and save the attention weights
     65     # of block 1 and 2 (~1 line)
---> 66     x, block1, block2 =  self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)
     67     #update attention_weights dictionary with the attention weights of block 1 and block 2
     68     attention_weights['decoder_layer{}_block1_self_att'.format(i+1)] = block1

Cell In[15], line 67, in DecoderLayer.call(self, x, enc_output, training, look_ahead_mask, padding_mask)
     61 Q1 = self.layernorm1(x + mult_attn_out1)
     63 # BLOCK 2
     64 # calculate self-attention using the Q from the first block and K and V from the encoder output. 
     65 # Dropout will be applied during training
     66 # Return attention scores as attn_weights_block2 (~1 line) 
---> 67 mult_attn_out2, attn_weights_block2 = self.mha2(Q1, enc_output, enc_output, padding_mask, return_attention_scores = True)
     69 # # apply layer normalization (layernorm2) to the sum of the attention output and the Q from the first block (~1 line)
     70 mult_attn_out2 = self.layernorm2(Q1 + mult_attn_out2)

InvalidArgumentError: Exception encountered when calling layer 'softmax_58' (type Softmax).

{{function_node __wrapped__AddV2_device_/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name: 

Call arguments received by layer 'softmax_58' (type Softmax):
  • inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
  • mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)

```

ok this actually helps where your code might be going wrong and actually the link I shared had the same issue.

Two issues clearly visible and actually is mentioned in the link I shared earlier, don’t use training as true parameter while passing information from block layers to multihead attention, read again the instructions carefully, training=training is only instructed to use at one place.

next your prediction ID code is incorrectly sequenced, you never apply first model then give output and input. any model should always follow input, output, model generalization rule.

this is the post comment from the same link I shared earlier should address the issue

but I doubt you might have more error, so let us know.. please take your time again start from first exercise 1, read every instruction, see what you might be missing with the code you wrote. remember some of the unittedt don’t always catch all the variability in text generation as the codes might be write but it’s implementation creativity might be incorrect causing you error in latter exercise.

Hi @Deepti_Prasad ,

I have all test passed till the last graded assignment

The code summarize is a default code I didn?t change any
```
def summarize(model, input_document):
“”"
A function for summarization using the transformer model
Arguments:
input_document (tf.Tensor): Input data to summarize
Returns:
_ (str): The summary of the input_document
“”"
input_document = tokenizer.texts_to_sequences([input_document])
input_document = tf.keras.preprocessing.sequence.pad_sequences(input_document, maxlen=encoder_maxlen, padding=‘post’, truncating=‘post’)
encoder_input = tf.expand_dims(input_document[0], 0)

output = tf.expand_dims([tokenizer.word_index["[SOS]"]], 0)

for i in range(decoder_maxlen):
    predicted_id = next_word(model, encoder_input, output)
    output = tf.concat([output, predicted_id], axis=-1)
    
    if predicted_id == tokenizer.word_index["[EOS]"]:
        break

return tokenizer.sequences_to_texts(output.numpy())[0]  # since there is just one translated document

```

Hi @Deepti_Prasad
I have problem with this
```
please make sure not to post any grade functions codes as it is violation of code of conduct.
```
please check carefully I posted ungraded part

please send me screenshot of the codes by personal DM. I sending you Hi.

okay understood that it is ungraded part of codes but running that code gave the error stating you probably have passed an incorrect argument in the graded function codes, and chances are unittest didn’t catch that error because it might be mistake identity between global and local variable recall.

Let me go through your codes and get back to you.

@Debottam_Bakshi_Gupt

issues with your assignment.

  1. in exercise 1, GRADED FUNCTION: scaled_dot_product_attention

Code line :backhand_index_pointing_right:t2: add the mask to the scaled tensor.

You code is incorrect adding mask value to the scales tensor, here is what instructions tells you to do

Multiply (1. - mask) by -1e9 before adding it to the scaled attention logits. although you did multiply by this value but have added incorrect bugging and shaping values in the iterative loop before scaled attention logits which was required at all.

  1. For exercise 2 GRADED FUNCTION: DecoderLayer => While applying normalisation layer, your placement of attention output and input in layer 1 and 2 both is incorrectly placed. Mention attention output first and then input value in these two places for 1 and 2nd normalisation layer as the embedding dimension can confuse the output.
  2. next issue is in exercise 5 GRADED FUNCTION: next_word, For code line Create a padding mask for the input (decoder) you are incorrectly passing the mask for the input as you used the local argument output to be passed by the create_padding_mask for decoder where as instruction clearly mentions you to pass it to input that is encoder_input. Remember while we pass mask from encoder to decoder the input is same i.e. encoder_input.
  3. In the same exercise 5, for code run the prediction of the next word with the transformer model, instruction for training is told to be set to False as per instructions under Exercise 5 header :backhand_index_pointing_right:t2: Exercise 5 - next_word Write a helper function that predicts the next word, so you can use it to write the whole sentences. Hint: this is very similar to what happens in the train_step, but you have to set the training of the model to False. But you have mentioned it as True.while predicting the next word.

Do these corrections to pass your assignment.

Regards

DP

Hi @Deepti_Prasad ,

Thanks a lot.

Debottam

1 Like