Course5_week4 Size of attention_weights

What should be the size of the tensor attention_weights in the function scaled_dot_product_attention_test?

Should it be (3,1,14) or (3,1,3,4)?
Thanks, I am stuck!

Probably you should have posted this in Course 5, not Course 4.

I get (3, 4).

Yes, you are right, it should be course 5.

I think my problem comes from the function create_padding_mask which creates extra dimension with
return seq[:, tf.newaxis, tf.newaxis, :]
as the last line

I am not sure if I coded this or if it was there already…

Do you have this as well?

The create_padding_mask() function was provided in the notebook You did not have to modify it.

ok, that’s what I thought.
Then I don’t understand where the error for my scaled_dot_product_attention comes from.

It contains the right values but it seems like the shape of the tensor (3,1,3,2) is not correct.

attention_weights uses tf.keras.activations.softmax(…).

And I think axis should be dk, not -1. I’m not sure whether -1 works there in all cases.
But, if you do that, dk should not include the square root. So you’d need to modify your code for the scaled_attention_logits.

But it still is not working.
Can you confirm that the shape of the output of scaled_dot_product_attention should be (3,1,3,2) for the test scaled_dot_product_attention_test ?

I get (3,2) for the output shape.

My bad, I was using the mask created before instead of the mask given in that function. My bad.