I have an issue with the size of the masked sequences.
Why does the softmax changes the shape of the tensor?
Running this:
print(tf.keras.activations.softmax(x).shape)
print(tf.keras.activations.softmax(x + (create_padding_mask(x) * -1.0e9)).shape)
print (x.shape)
print (create_padding_mask(x).shape)
Gives:
(3, 5)
(3, 1, 3, 5)
(3, 5)
(3, 1, 1, 5)
It seems that the dimension (3,1,3,5) is not what I should have and that causes me trouble afterwards.
Thanks!
softmax won’t change shape, but (x + create_padding_mask(x) * -1.0e9) will do.
x shape: (3, 5)
mask shape: (3, 1, 1, 5)
summation shape: (3, 1, 3, 5)
Please refer to broadcasting rule.
Thank you Edward.
But then I have the wrong shape for the output of my scaled_dot_product_attention function.
For the test scaled_dot_product_attention_test, the output has shape (3, 1, 3, 2) but it looks that it should (3,2).
Could you help me with that?
Thanks
scaled_dot_product_attention function provides mask for you, you should not use create_padding_mask to generate your own mask.
Thanks a lot! Damn, I was dumb!
omg I did the same thing!!! Thanks @edwardyu mentor for pointing this out.