C4W2_Assignment Transformer Summarizer Exercise 3 Decoder Failed test cases

hj320 · October 19, 2024, 2:25am

In the exercise 3 (Decoder) I am getting the following errors:

Failed test case: Wrong values in x.
Expected: [1.6461557, -0.7657816, -0.04255769, -0.8378165]
Got: [ 1.5847092 -0.22151496 -0.17638591 -1.1868083 ]

Failed test case: Wrong values in att_weights[decoder_layer1_block1_self_att].
Expected: [0.51728565, 0.48271435, 0.0]
Got: [0.49889773 0.5011023 0. ]

Failed test case: Wrong values in outd when training=True.
Expected: [1.6286429, -0.7686589, 0.00983591, -0.86982]
Got: [ 1.5842563 -0.35723066 -0.06137132 -1.1656543 ]

Failed test case: Wrong values in outd when training=True and use padding mask.
Expected: [1.390952, 0.2794097, -0.2910638, -1.3792979]
Got: [ 1.1789213 0.6112976 -0.33207467 -1.4581443 ]

I have followed all the steps and hints given in the code block and have passed all the previous units tests in the assignment but I am unable to solve the issue. I have looked into similar posts and tried the solutions but the issue still persists.

Deepti_Prasad · October 19, 2024, 3:00am

hi @hj320

Please click on my name and then message the screenshot of only decoder grade cell codes.

Please make sure not to post any codes here.

Also confirm if your previous unittest were passed.

Regards
DP

hj320 · October 19, 2024, 11:54am

Hi @Deepti_Prasad

I have shared the screenshot.
Yes all previous unittests were passed.

Thanks and Regards

Deepti_Prasad · October 19, 2024, 12:05pm

hi @hj320

The codes you shared from decoder recall grades are correct. I am suspecting in the DecoderLayer def call cell, for block 1, make sure you have used look ahead mask and set the return attention scores to true.

Regards
DP

Deepti_Prasad · October 20, 2024, 3:01am

@hj320

You have added training=training to block1 and block2 which is not suppose to be added as the instruction mentions dropout will be added during training. So training=training is only added to ffn output while applying dropout.

Regards
DP

hj320 · October 20, 2024, 4:46am

I tried removing the training parameter from the blocks.
Didn’t make a difference in the unit tests for the decoder.
Not sure what is going wrong with the code.

Thanks and Regards
hj320

Deepti_Prasad · October 20, 2024, 4:53am

Then the next suspect would be to check previous grade cell codes. I would check scaled dot product attention as your error points towards wrong value of x.

Kindly share grade cell screenshot of both scaled dot product as well as encoder grade cells.

Regards
DP

Deepti_Prasad · October 20, 2024, 5:25am

hi @hj320

Errors in Scaled dot product attention grade cell.

For code multiply q and k transposed. You are using incorrect python function code for multiply q and k.
In the additional hints section just before the grade cell, it mentions
you may find tf.matmul useful for matrix multiplication (check how you can use the parameter transpose_b)
Next to calculate dk, kindly use tf.shape rather than k.shape. Also as you know dk is the dimension of the keys, which is used to scale everything down so the softmax doesn’t explode. So dimension reduction is [-1] not -2.
In the same next code line, to calculate scaled attention logits, in denominator you are suppose to use tf.math.sqrt(dk) and not dk**0.5 as dk come in square root as per calculation.
While adding mask to the scaled tensor, your code is right but we have seen even not mention decimal point makes different to scaled weight, so instruction mentions to Multiply (1. - mask) by -1e9 before but you multiplied (1-mask). Make sure you multiply just the way instructions mentions before the grade cell.
While softmax is normalized, you do not require to add any axis argument as you are only require to use right activation function which you did. So remove axis=-1.

Let me know after these corrections, what is the progress.

Regards
DP

hj320 · October 20, 2024, 6:08am

Thanks for the pointing out the corrections.
I have made the changes but the outputs remain the same, the unittests for the Decoder keeps failing.

Regards
hj320

Deepti_Prasad · October 20, 2024, 6:26am

please send screenshot of how you have made the corrections. also if possible share screenshot of the failed test too by personal DM

Deepti_Prasad · October 20, 2024, 10:47am

you are suppose to use tf.matmul rather tf.linalg matmul to multiply q and k @hj320

Deepti_Prasad · October 21, 2024, 1:22am

@hj320 harsh I think I told you to remove training =training from self attention block1 and block2.

Apply training=training only to ffn output while applying dropout as per instructions.

Seems like you aren’t reading my responses !!!

Also once you make corrections, make sure you save and then re-run cells one by one from beginning.

hj320 · October 21, 2024, 2:04am

I apologize for causing you trouble.
I had followed your instructions and removed the training parameter from the attention blocks. However, when it did not change anything I tried to run it again with the training parameter in.

As per your response I have commented out the parameter again, however, the result stays the same.

I have shared the screenshot with you. Please let me know if I am making any mistakes.

Deepti_Prasad · October 21, 2024, 4:49am

My apologies this time as I missed noticing minor error, in the add positiong encoding to word embedding, you used
[:, seq_len, :] where as it should be [:, :seq_len, :]

check the instructions before the grade cell it mentions this. @hj320

hj320 · October 21, 2024, 7:42am

Thank you so much for your help.
I made the correction and the unittests passed successfully.

Thanks and Regards,
hj320

Topic		Replies	Views
C4W2_Assignment - Ex 7 Decoder Layer output NLP with Attention Models week-2	12	374	April 4, 2024
C4W2_Assignement NLP with Attention Models week-2	1	41	October 16, 2024
NLP C4 week 2 Transformer Test case error NLP with Attention Models week-2	5	276	February 27, 2024
C4W2 Assignment: decoder not passing tests NLP with Attention Models week-2	3	97	June 24, 2024
C4W2 Exercise 2 - sample test is correct but unit test cases are correct function is failing NLP with Attention Models week-2	5	45	October 20, 2024

C4W2_Assignment Transformer Summarizer Exercise 3 Decoder Failed test cases

Related topics