In Exercise 3 - scaled_dot_product_attention I am getting an assertion error that shows no difference between my calculated output and the correct output tensor. How would the assertion be False if both my output and the correct output are the same or similar?
Hello, @Steven22,
The assert checks weights
but seems you printed output
. Perhaps did you swap anything?
Cheers,
Raymond
There are lots of details to get right of course. I did my usual debugging method of adding print statements to show intermediate values. Here’s what I see when I run the test cell for that function with code that passes both the unit tests and the grader:
q.shape (3, 4)
k.shape (4, 4)
v.shape (4, 2)
matmul_qk.shape (3, 4)
matmul_qk =
[[2. 3. 1. 1.]
[2. 2. 2. 1.]
[2. 2. 0. 1.]]
dk 4.0
type(scaled_attention_logits) <class 'tensorflow.python.framework.ops.EagerTensor'>
scaled_attention_logits.shape (3, 4)
attention_weights.shape (3, 4)
attention_weights =
[[0.2589478 0.42693272 0.15705977 0.15705977]
[0.2772748 0.2772748 0.2772748 0.16817567]
[0.33620113 0.33620113 0.12368149 0.2039163 ]]
sum(attention_weights(axis = -1)) =
[[1.0000001]
[1. ]
[1. ]]
output.shape (3, 2)
output =
[[0.74105227 0.15705977]
[0.7227253 0.16817567]
[0.6637989 0.2039163 ]]
q.shape (3, 4)
k.shape (4, 4)
v.shape (4, 2)
matmul_qk.shape (3, 4)
matmul_qk =
[[2. 3. 1. 1.]
[2. 2. 2. 1.]
[2. 2. 0. 1.]]
dk 4.0
type(scaled_attention_logits) <class 'tensorflow.python.framework.ops.EagerTensor'>
scaled_attention_logits.shape (3, 4)
mask.shape (1, 3, 4)
applying mask =
[[[1 1 0 1]
[1 1 0 1]
[1 1 0 1]]]
attention_weights.shape (1, 3, 4)
attention_weights =
[[[0.3071959 0.5064804 0. 0.18632373]
[0.38365173 0.38365173 0. 0.23269653]
[0.38365173 0.38365173 0. 0.23269653]]]
sum(attention_weights(axis = -1)) =
[[[1.]
[1.]
[1.]]]
output.shape (1, 3, 2)
output =
[[[0.6928041 0.18632373]
[0.61634827 0.23269653]
[0.61634827 0.23269653]]]
All tests passed
One approach would be to add similar print statements to your code and see if the comparison sheds any light on where our code differs.
Paul. Finally got Exercise 3 to work. Your outputs gave me something to shoot for. I was reading too much into the explanations of the algorithm. I had extra lines of code that were unnecessary.
I was using tf.keras.utils.normalize() to normalize the scaled logits matrix prior to application of the softmax activation.
That’s good news. Glad to hear that having the intermediate results was helpful. Thanks for confirming!