C4W2 Assignment Exercise 5 - next_word

Ardavan_Sherafat · April 9, 2024, 6:17pm

I have completed the Exercise 5 - next_word of the assignment and all the test cases pass, I also get the full grade for the assignment, but the model does not output anything. Not sure if I am doing something wrong that none of the test cases are capturing it or there is problem with the notebook:
Predicted token:
Predicted word:

Expected Output

Predicted token: [[14859]]
Predicted word: masses

Training set example:
[SOS] amanda: i baked cookies. do you want some? jerry: sure! amanda: i’ll bring you tomorrow [EOS]

Human written summary:
[SOS] amanda baked cookies and will bring jerry some tomorrow. [EOS]

Model written summary:
[SOS]

jyadav202 · April 10, 2024, 2:34am

Strange that unit test did not catch the error. Please send me a copy of your code on my private chat. I give it a look.

jyadav202 · April 11, 2024, 4:06am

The issue was found in the output target variable manipulation.

scheine · April 12, 2024, 8:03pm

---> 67 mult_attn_out2, attn_weights_block2 = self.mha2(Q1, enc_output, enc_output, padding_mask, return_attention_scores=True)
     69 # apply layer normalization (layernorm2) to the sum of the attention output and the output of the first block (~1 line)
     70 mult_attn_out2 = self.layernorm2(mult_attn_out2 + Q1)

InvalidArgumentError: Exception encountered when calling layer 'softmax_58' (type Softmax).

{{function_node __wrapped__AddV2_device_/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name: 

Call arguments received by layer 'softmax_58' (type Softmax):
  • inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
  • mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)```

got an error at next_word, but passed all previous unit tests, not sure how to proceed.

jyadav202 · April 13, 2024, 5:21am

@scheine I cant be very sure without looking at the code. But have a look at look_ahead_mask or dec_padding_mask in next_word function. That might be a point of failure. Note that create_look_ahead_mask requires sequence_length as argument unlike create_padding_mask.
If that does not solve it, send me your code.

scheine · April 13, 2024, 11:35am

Thanks, I fixed the shape issue after updating code in create_padding_mask.

jyadav202 · April 14, 2024, 4:45am

@scheine I am glad.
I would also, request you to remove the solution code from your reply as sharing solutions is against our community guidelines.

Regards,

scheine · April 14, 2024, 11:07am

thanks, updated.

Lu_Huang · April 23, 2024, 9:34pm

@jyadav202 I have the same error, all unit test passed. Can you give some hint on what is wrong?

mult_attn_out2, attn_weights_block2 = self.mha2(Q1, enc_output, enc_output, attention_mask = padding_mask, return_attention_scores=True)
69 # apply layer normalization (layernorm2) to the sum of the attention output and the output of the first block (~1 line)
70 mult_attn_out2 = self.layernorm2(Q1 + mult_attn_out2)

InvalidArgumentError: Exception encountered when calling layer ‘softmax_92’ (type Softmax).

{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name:

Call arguments received by layer ‘softmax_92’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)

TMosh · April 23, 2024, 9:47pm

I’m not a mentor for this course, but the assignment seems similar to one from DLS Course 5 Week 4.

I would guess the issue is that your Q1 is not the correct shape.

jyadav202 · April 24, 2024, 3:59am

Hi!
Yes, please recheck Q1. Also whether your output matched the “Expected Output” . If the error still persists, send me a copy of your assignment, I will take a look.

John_Murphy1 · May 2, 2024, 6:21pm

Hi,
My problem is similar, except my error is “softmax_404”.
My grade function next_word gives the expected output [[14859]] and masses and it passes the tests and as a part of this:
w2_unittest.test_next_word(next_word, transformer, encoder_input, output)
" All tests passed!"

But when I try to summarize a sentence, this line:
summarize(transformer, document[training_set_example])

references this code from DecoderLayer.call:
mult_attn_out2, attn_weights_block2 = self.mha2(… )

which throws the following error:
InvalidArgumentError: Exception encountered when calling layer ‘softmax_404’ (type Softmax).
{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:AddV2] name:
Call arguments received by layer ‘softmax_404’ (type Softmax):
• inputs=tf.Tensor(shape=(1, 2, 2, 150), dtype=float32)
• mask=tf.Tensor(shape=(1, 1, 1, 2), dtype=float32)

I printed these from next_word:
enc_padding_mask: tf.Tensor(
[[[1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]], shape=(1, 1, 150), dtype=float32)
look_ahead_mask: tf.Tensor([[[1.]]], shape=(1, 1, 1), dtype=float32)
dec_padding_mask: tf.Tensor([[[1.]]], shape=(1, 1, 1), dtype=float32)
output: tf.Tensor([[7]], shape=(1, 1), dtype=int32)
Predicted token: [[14859]]
Predicted word: masses

Any suggestions would be greatly appreciated. Thanks.
John

bsatom · May 4, 2024, 8:20pm

Using
dec_padding_mask = create_padding_mask(encoder_input) rather than dec_padding_mask = create_padding_mask(output) and it works.

My understanding is that the dec_padding_mask is based on the encoder_input, as it’s intended to mask out padding tokens in the original input sequence which flows through the encoder and becomes the enc_output.

The padding applied to the second MHA input, which receives the encoder’s output. However, this padding should be identical to the padding applied to the original encoder input

Unfortunately, the term “output” is also used as the target name, specifically referring to the summarized sentences. This usage conflicts with the “output” of the encoder used for the second MHA input.

The first MHA receives the target (variable called output) and therefore should use the causal, look ahead padding, applied on the target (output).

Is that correct?

mats · August 21, 2024, 9:41am

Could you say a bit more? In what place is the bug in the code. I had the same issue, I corrected input to
look_ahead_mask = create_look_ahead_mask(…) and it seems to be working better. However, I’m not sure if that’s the right solution.
I can share the code if you wanted.

Deepti_Prasad · August 30, 2024, 6:15am

@mats create a new thread whenever you any issue, your comments don’t get notified or get missed when you comment on old threads

regards
DP

Rahul_Ohlan · January 6, 2025, 4:12am

this worked for me, thanks a lot! – yeah it makes sense, we gotta mask out the padding tokens from original input sequence

Topic		Replies	Views
C4W2 Assignment (summary) NLP with Attention Models week-2	2	35	January 16, 2025
C4_W2 Assignment Exercise 5 NLP with Attention Models week-2	3	64	September 8, 2024
C4W2_Assignment NLP Transformer Summariser Issues NLP with Attention Models week-2	6	431	January 11, 2024
NLP with Attention Models C4W2 - Exercise 5 - next_word “softmax_404” error NLP with Attention Models week-2	12	359	May 6, 2024
C4W2 Assignment NLP Transformer Summariser Error NLP with Attention Models week-3	19	555	July 9, 2024

C4W2 Assignment Exercise 5 - next_word

Expected Output

Related topics