C4W2 Assignment Exercise 5 - next_word

I have completed the Exercise 5 - next_word of the assignment and all the test cases pass, I also get the full grade for the assignment, but the model does not output anything. Not sure if I am doing something wrong that none of the test cases are capturing it or there is problem with the notebook:
Predicted token:
Predicted word:

Expected Output

Predicted token: [[14859]]
Predicted word: masses

Training set example:
[SOS] amanda: i baked cookies. do you want some? jerry: sure! amanda: i’ll bring you tomorrow :slight_smile: [EOS]

Human written summary:
[SOS] amanda baked cookies and will bring jerry some tomorrow. [EOS]

Model written summary:
[SOS]

Strange that unit test did not catch the error. Please send me a copy of your code on my private chat. I give it a look.

The issue was found in the output target variable manipulation.

---> 67 mult_attn_out2, attn_weights_block2 = self.mha2(Q1, enc_output, enc_output, padding_mask, return_attention_scores=True)
     69 # apply layer normalization (layernorm2) to the sum of the attention output and the output of the first block (~1 line)
     70 mult_attn_out2 = self.layernorm2(mult_attn_out2 + Q1)

InvalidArgumentError: Exception encountered when calling layer 'softmax_58' (type Softmax).

{{function_node __wrapped__AddV2_device_/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name: 

Call arguments received by layer 'softmax_58' (type Softmax):
  • inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
  • mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)```

got an error at next_word, but passed all previous unit tests, not sure how to proceed.

@scheine I cant be very sure without looking at the code. But have a look at look_ahead_mask or dec_padding_mask in next_word function. That might be a point of failure. Note that create_look_ahead_mask requires sequence_length as argument unlike create_padding_mask.
If that does not solve it, send me your code.

Thanks, I fixed the shape issue after updating code in create_padding_mask.

@scheine I am glad.
I would also, request you to remove the solution code from your reply as sharing solutions is against our community guidelines.

Regards,

thanks, updated.

1 Like

@jyadav202 I have the same error, all unit test passed. Can you give some hint on what is wrong?

mult_attn_out2, attn_weights_block2 = self.mha2(Q1, enc_output, enc_output, attention_mask = padding_mask, return_attention_scores=True)
69 # apply layer normalization (layernorm2) to the sum of the attention output and the output of the first block (~1 line)
70 mult_attn_out2 = self.layernorm2(Q1 + mult_attn_out2)

InvalidArgumentError: Exception encountered when calling layer ‘softmax_92’ (type Softmax).

{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name:

Call arguments received by layer ‘softmax_92’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)

I’m not a mentor for this course, but the assignment seems similar to one from DLS Course 5 Week 4.

I would guess the issue is that your Q1 is not the correct shape.

Hi!
Yes, please recheck Q1. Also whether your output matched the “Expected Output” . If the error still persists, send me a copy of your assignment, I will take a look.

Hi,
My problem is similar, except my error is “softmax_404”.
My grade function next_word gives the expected output [[14859]] and masses and it passes the tests and as a part of this:
w2_unittest.test_next_word(next_word, transformer, encoder_input, output)
" All tests passed!"

But when I try to summarize a sentence, this line:
summarize(transformer, document[training_set_example])

references this code from DecoderLayer.call:
mult_attn_out2, attn_weights_block2 = self.mha2(… )

which throws the following error:
InvalidArgumentError: Exception encountered when calling layer ‘softmax_404’ (type Softmax).
{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name:
Call arguments received by layer ‘softmax_404’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)

I printed these from next_word:
enc_padding_mask: tf.Tensor(
[[[1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]], shape=(1, 1, 150), dtype=float32)
look_ahead_mask: tf.Tensor([[[1.]]], shape=(1, 1, 1), dtype=float32)
dec_padding_mask: tf.Tensor([[[1.]]], shape=(1, 1, 1), dtype=float32)
output: tf.Tensor([[7]], shape=(1, 1), dtype=int32)
Predicted token: [[14859]]
Predicted word: masses

Any suggestions would be greatly appreciated. Thanks.
John

Using
dec_padding_mask = create_padding_mask(encoder_input) rather than dec_padding_mask = create_padding_mask(output) and it works.

My understanding is that the dec_padding_mask is based on the encoder_input, as it’s intended to mask out padding tokens in the original input sequence which flows through the encoder and becomes the enc_output.

The padding applied to the second MHA input, which receives the encoder’s output. However, this padding should be identical to the padding applied to the original encoder input

Unfortunately, the term “output” is also used as the target name, specifically referring to the summarized sentences. This usage conflicts with the “output” of the encoder used for the second MHA input.

The first MHA receives the target (variable called output) and therefore should use the causal, look ahead padding, applied on the target (output).

Is that correct?

1 Like

Could you say a bit more? In what place is the bug in the code. I had the same issue, I corrected input to
look_ahead_mask = create_look_ahead_mask(…) and it seems to be working better. However, I’m not sure if that’s the right solution.
I can share the code if you wanted.

@mats create a new thread whenever you any issue, your comments don’t get notified or get missed when you comment on old threads

regards
DP

this worked for me, thanks a lot! – yeah it makes sense, we gotta mask out the padding tokens from original input sequence