C5 W4 A1: Why is create_padding_mask adding more dimensions than its supposed to?

create_padding_mask(x) is supposed to take in input an (n,m) matrix and returns as an output an (n,1,m) dimensional binary tensor, as far as I understand.

Yet, when I implement it in my code for Exercise 3 scaled_dot_product_attention, I find that is not happening. I specifically find it takes input size (3,4) and returns output size (3,3,4).

I have the following code:
scaled_attention = scaled_attention_logits+ (1 - create_padding_mask(scaled_attention_logits)) * -1.0e9

scaled_attention_logits returns shape (3,4) as expected.
scaled_attention returns shape (3,3,4) for some reason. Can you please tell me why?

Please be sure you’ve read the Note text just above the “create_padding_mask()” cell.

Also, I believe it is more correct to say that the output of create_padding_mask() is size (n, None, m), because that’s how “tf.newaxis” is defined.