Why does applying the padding mask change the tensor's shape C5W4Asn1

Hello everyone!

Notice that when the padding mask gets applied, the shape changes. (in between exercises 2 and 3 of C5W5Asn1)
That seems incorrect to me. If I’m mistaken, please do explain!!

# Softmax original sequence

tf.keras.activations.softmax(
    x
).numpy().shape

out–> (3, 5)

# Softmax masked sequence
tf.keras.activations.softmax(
    x + (1 - create_padding_mask(x)) * -1.0e9
).numpy().shape

out → (3,3, 5)

The reason there is an additional axis is because create_padding_mask() includes a newaxis parameter in the return values.

I have the same question. I see that a new axis was inserted at the end of

def create_padding_mask(decoder_token_ids):

But why was it added?

If I print using the following

print(x)
print((1 - create_padding_mask(x)) * -1.0e9)

The output is the following:

tf.Tensor(
[[7. 6. 0. 0. 1.]
 [1. 2. 3. 0. 0.]
 [0. 0. 0. 4. 5.]], shape=(3, 5), dtype=float32)
tf.Tensor(
[[-0.e+00 -0.e+00 -1.e+09 -1.e+09 -0.e+00]
 [-0.e+00 -0.e+00 -0.e+00 -1.e+09 -1.e+09]
 [-1.e+09 -1.e+09 -1.e+09 -0.e+00 -0.e+00]], shape=(3, 5), dtype=float32)

It doesn’t make sense that broadcasting is desired. It changed the shape of the output.

If I remove the axis insertion the result is the much more sensical:

tf.Tensor(
[[7. 6. 0. 0. 1.]
 [1. 2. 3. 0. 0.]
 [0. 0. 0. 4. 5.]], shape=(3, 5), dtype=float32)
tf.Tensor(
[[-0.e+00 -0.e+00 -1.e+09 -1.e+09 -0.e+00]
 [-0.e+00 -0.e+00 -0.e+00 -1.e+09 -1.e+09]
 [-1.e+09 -1.e+09 -1.e+09 -0.e+00 -0.e+00]], shape=(3, 5), dtype=float32)

Here you can see the negative infinities occur at just the spots corresponding to zeros in x.

Perhaps the new dimension is need later in the notebook. I haven’t gotten that far. But at this section:

print(tf.keras.activations.softmax(x))
print(tf.keras.activations.softmax(x + (1 - create_padding_mask(x)) * -1.0e9))

It doesn’t make sense.

2 Likes