C4W2_Assignment Transformer Summarizer Exercise 3 Decoder Failed test cases

In the exercise 3 (Decoder) I am getting the following errors:

Failed test case: Wrong values in x.
Expected: [1.6461557, -0.7657816, -0.04255769, -0.8378165]
Got: [ 1.5847092 -0.22151496 -0.17638591 -1.1868083 ]

Failed test case: Wrong values in att_weights[decoder_layer1_block1_self_att].
Expected: [0.51728565, 0.48271435, 0.0]
Got: [0.49889773 0.5011023 0. ]

Failed test case: Wrong values in outd when training=True.
Expected: [1.6286429, -0.7686589, 0.00983591, -0.86982]
Got: [ 1.5842563 -0.35723066 -0.06137132 -1.1656543 ]

Failed test case: Wrong values in outd when training=True and use padding mask.
Expected: [1.390952, 0.2794097, -0.2910638, -1.3792979]
Got: [ 1.1789213 0.6112976 -0.33207467 -1.4581443 ]

I have followed all the steps and hints given in the code block and have passed all the previous units tests in the assignment but I am unable to solve the issue. I have looked into similar posts and tried the solutions but the issue still persists.

hi @hj320

Please click on my name and then message the screenshot of only decoder grade cell codes.

Please make sure not to post any codes here.

Also confirm if your previous unittest were passed.

Regards
DP

Hi @Deepti_Prasad

I have shared the screenshot.
Yes all previous unittests were passed.

Thanks and Regards

1 Like

hi @hj320

The codes you shared from decoder recall grades are correct. I am suspecting in the DecoderLayer def call cell, for block 1, make sure you have used look ahead mask and set the return attention scores to true.

Regards
DP

@hj320

You have added training=training to block1 and block2 which is not suppose to be added as the instruction mentions dropout will be added during training. So training=training is only added to ffn output while applying dropout.

Regards
DP

I tried removing the training parameter from the blocks.
Didn’t make a difference in the unit tests for the decoder.
Not sure what is going wrong with the code.

Thanks and Regards
hj320

Then the next suspect would be to check previous grade cell codes. I would check scaled dot product attention as your error points towards wrong value of x.

Kindly share grade cell screenshot of both scaled dot product as well as encoder grade cells.

Regards
DP

hi @hj320

Errors in Scaled dot product attention grade cell.

  1. For code multiply q and k transposed. You are using incorrect python function code for multiply q and k.
    In the additional hints section just before the grade cell, it mentions
    you may find tf.matmul useful for matrix multiplication (check how you can use the parameter transpose_b)

  2. Next to calculate dk, kindly use tf.shape rather than k.shape. Also as you know dk is the dimension of the keys, which is used to scale everything down so the softmax doesn’t explode. So dimension reduction is [-1] not -2.
    In the same next code line, to calculate scaled attention logits, in denominator you are suppose to use tf.math.sqrt(dk) and not dk**0.5 as dk come in square root as per calculation.

  3. While adding mask to the scaled tensor, your code is right but we have seen even not mention decimal point makes different to scaled weight, so instruction mentions to Multiply (1. - mask) by -1e9 before but you multiplied (1-mask). Make sure you multiply just the way instructions mentions before the grade cell.

  4. While softmax is normalized, you do not require to add any axis argument as you are only require to use right activation function which you did. So remove axis=-1.

Let me know after these corrections, what is the progress.

Regards
DP

Thanks for the pointing out the corrections.
I have made the changes but the outputs remain the same, the unittests for the Decoder keeps failing.

Regards
hj320

please send screenshot of how you have made the corrections. also if possible share screenshot of the failed test too by personal DM

you are suppose to use tf.matmul rather tf.linalg matmul to multiply q and k @hj320

@hj320 harsh I think I told you to remove training =training from self attention block1 and block2.

Apply training=training only to ffn output while applying dropout as per instructions.

Seems like you aren’t reading my responses !!!

Also once you make corrections, make sure you save and then re-run cells one by one from beginning.

I apologize for causing you trouble.
I had followed your instructions and removed the training parameter from the attention blocks. However, when it did not change anything I tried to run it again with the training parameter in.

As per your response I have commented out the parameter again, however, the result stays the same.

I have shared the screenshot with you. Please let me know if I am making any mistakes.

My apologies this time as I missed noticing minor error, in the add positiong encoding to word embedding, you used
[:, seq_len, :] where as it should be [:, :seq_len, :]

check the instructions before the grade cell it mentions this. @hj320

Thank you so much for your help.
I made the correction and the unittests passed successfully.

Thanks and Regards,
hj320

1 Like