Hi,
What should be the size of the tensor attention_weights in the function scaled_dot_product_attention_test?
Should it be (3,1,14) or (3,1,3,4)?
Thanks, I am stuck!
Hi,
What should be the size of the tensor attention_weights in the function scaled_dot_product_attention_test?
Should it be (3,1,14) or (3,1,3,4)?
Thanks, I am stuck!
Probably you should have posted this in Course 5, not Course 4.
I get (3, 4).
Yes, you are right, it should be course 5.
I think my problem comes from the function create_padding_mask which creates extra dimension with
return seq[:, tf.newaxis, tf.newaxis, :]
as the last line
I am not sure if I coded this or if it was there already…
Do you have this as well?
The create_padding_mask() function was provided in the notebook You did not have to modify it.
ok, that’s what I thought.
Then I don’t understand where the error for my scaled_dot_product_attention comes from.
It contains the right values but it seems like the shape of the tensor (3,1,3,2) is not correct.
attention_weights uses tf.keras.activations.softmax(…).
When you get your code working, please edit your replies that contain your code and delete the code. That clears you with the course Honor Code.
And I think axis should be dk, not -1. I’m not sure whether -1 works there in all cases.
But, if you do that, dk should not include the square root. So you’d need to modify your code for the scaled_attention_logits.
Thanks.
But it still is not working.
Can you confirm that the shape of the output of scaled_dot_product_attention should be (3,1,3,2) for the test scaled_dot_product_attention_test ?
No.
I get (3,2) for the output shape.
My bad, I was using the mask created before instead of the mask given in that function. My bad.