C4W2_Assignment Transformer Summarizer Exercise 3 Decoder Failed test cases

Deepti_Prasad · October 20, 2024, 5:25am

Errors in Scaled dot product attention grade cell.

For code multiply q and k transposed. You are using incorrect python function code for multiply q and k.
In the additional hints section just before the grade cell, it mentions
you may find tf.matmul useful for matrix multiplication (check how you can use the parameter transpose_b)
Next to calculate dk, kindly use tf.shape rather than k.shape. Also as you know dk is the dimension of the keys, which is used to scale everything down so the softmax doesn’t explode. So dimension reduction is [-1] not -2.
In the same next code line, to calculate scaled attention logits, in denominator you are suppose to use tf.math.sqrt(dk) and not dk**0.5 as dk come in square root as per calculation.
While adding mask to the scaled tensor, your code is right but we have seen even not mention decimal point makes different to scaled weight, so instruction mentions to Multiply (1. - mask) by -1e9 before but you multiplied (1-mask). Make sure you multiply just the way instructions mentions before the grade cell.
While softmax is normalized, you do not require to add any axis argument as you are only require to use right activation function which you did. So remove axis=-1.

Let me know after these corrections, what is the progress.

Regards
DP

Topic		Replies	Views
C4W2_Assignment - Ex 7 Decoder Layer output NLP with Attention Models week-2	12	374	April 4, 2024
C4W2_Assignement NLP with Attention Models week-2	1	42	October 16, 2024
NLP C4 week 2 Transformer Test case error NLP with Attention Models week-2	5	276	February 27, 2024
C4W2 Assignment: decoder not passing tests NLP with Attention Models week-2	3	98	June 24, 2024
C4W2 Exercise 2 - sample test is correct but unit test cases are correct function is failing NLP with Attention Models week-2	5	46	October 20, 2024