Why does applying the padding mask change the tensor's shape C5W4Asn1

A112 · August 20, 2022, 11:25pm

Hello everyone!

Notice that when the padding mask gets applied, the shape changes. (in between exercises 2 and 3 of C5W5Asn1)
That seems incorrect to me. If I’m mistaken, please do explain!!

# Softmax original sequence

tf.keras.activations.softmax(
    x
).numpy().shape

out–> (3, 5)

# Softmax masked sequence
tf.keras.activations.softmax(
    x + (1 - create_padding_mask(x)) * -1.0e9
).numpy().shape

out → (3,3, 5)

TMosh · August 24, 2022, 5:22am

The reason there is an additional axis is because create_padding_mask() includes a newaxis parameter in the return values.

Chris_Genly · January 21, 2023, 12:56am

I have the same question. I see that a new axis was inserted at the end of

def create_padding_mask(decoder_token_ids):

But why was it added?

If I print using the following

print(x)
print((1 - create_padding_mask(x)) * -1.0e9)

The output is the following:

tf.Tensor(
[[7. 6. 0. 0. 1.]
 [1. 2. 3. 0. 0.]
 [0. 0. 0. 4. 5.]], shape=(3, 5), dtype=float32)
tf.Tensor(
[[-0.e+00 -0.e+00 -1.e+09 -1.e+09 -0.e+00]
 [-0.e+00 -0.e+00 -0.e+00 -1.e+09 -1.e+09]
 [-1.e+09 -1.e+09 -1.e+09 -0.e+00 -0.e+00]], shape=(3, 5), dtype=float32)

It doesn’t make sense that broadcasting is desired. It changed the shape of the output.

If I remove the axis insertion the result is the much more sensical:

tf.Tensor(
[[7. 6. 0. 0. 1.]
 [1. 2. 3. 0. 0.]
 [0. 0. 0. 4. 5.]], shape=(3, 5), dtype=float32)
tf.Tensor(
[[-0.e+00 -0.e+00 -1.e+09 -1.e+09 -0.e+00]
 [-0.e+00 -0.e+00 -0.e+00 -1.e+09 -1.e+09]
 [-1.e+09 -1.e+09 -1.e+09 -0.e+00 -0.e+00]], shape=(3, 5), dtype=float32)

Here you can see the negative infinities occur at just the spots corresponding to zeros in x.

Perhaps the new dimension is need later in the notebook. I haven’t gotten that far. But at this section:

print(tf.keras.activations.softmax(x))
print(tf.keras.activations.softmax(x + (1 - create_padding_mask(x)) * -1.0e9))

It doesn’t make sense.

Topic		Replies	Views
Course5_week4 Size of mask after softmax Sequence Models coursera-platform	6	688	March 15, 2025
C5w4 2.1 Padding mask Sequence Models week-4 , coursera-platform	9	290	March 9, 2024
C5_W4 Masking issue (?!) Sequence Models week-4 , coursera-platform	2	136	May 16, 2024
Create_padding_mask() function Sequence Models week-4 , coursera-platform	3	30	August 16, 2024
Padding Mask newaxis confusion Sequence Models coursera-platform	5	595	July 6, 2021

Why does applying the padding mask change the tensor's shape C5W4Asn1

Related topics