C5 W4 A1: Why is create_padding_mask adding more dimensions than its supposed to?

eoin12345abc · May 30, 2023, 11:55pm

create_padding_mask(x) is supposed to take in input an (n,m) matrix and returns as an output an (n,1,m) dimensional binary tensor, as far as I understand.

Yet, when I implement it in my code for Exercise 3 scaled_dot_product_attention, I find that is not happening. I specifically find it takes input size (3,4) and returns output size (3,3,4).

I have the following code:
print(scaled_attention_logits.shape)
scaled_attention = scaled_attention_logits+ (1 - create_padding_mask(scaled_attention_logits)) * -1.0e9
print(scaled_attention.shape)

scaled_attention_logits returns shape (3,4) as expected.
scaled_attention returns shape (3,3,4) for some reason. Can you please tell me why?

TMosh · May 31, 2023, 12:12am

Please be sure you’ve read the Note text just above the “create_padding_mask()” cell.

Also, I believe it is more correct to say that the output of create_padding_mask() is size (n, None, m), because that’s how “tf.newaxis” is defined.

Topic		Replies	Views
Course5_week4 Size of attention_weights Sequence Models coursera-platform	10	730	June 19, 2021
Course5_week4 Size of mask after softmax Sequence Models coursera-platform	6	688	March 15, 2025
Week 4 Transformer create_padding_mask function Sequence Models coursera-platform	1	544	September 22, 2021
C5w4 2.1 Padding mask Sequence Models week-4 , coursera-platform	9	290	March 9, 2024
Padding mask dimmensions Sequence Models coursera-platform	5	504	May 4, 2023

C5 W4 A1: Why is create_padding_mask adding more dimensions than its supposed to?

Related topics