A112
August 20, 2022, 11:25pm
1
Hello everyone!
Notice that when the padding mask gets applied, the shape changes. (in between exercises 2 and 3 of C5W5Asn1)
That seems incorrect to me. If I’m mistaken, please do explain!!
# Softmax original sequence
tf.keras.activations.softmax(
x
).numpy().shape
out–> (3, 5)
# Softmax masked sequence
tf.keras.activations.softmax(
x + (1 - create_padding_mask(x)) * -1.0e9
).numpy().shape
out → (3,3, 5)
TMosh
August 24, 2022, 5:22am
2
The reason there is an additional axis is because create_padding_mask() includes a newaxis parameter in the return values.
I have the same question. I see that a new axis was inserted at the end of
def create_padding_mask(decoder_token_ids):
But why was it added?
If I print using the following
print(x)
print((1 - create_padding_mask(x)) * -1.0e9)
The output is the following:
tf.Tensor(
[[7. 6. 0. 0. 1.]
[1. 2. 3. 0. 0.]
[0. 0. 0. 4. 5.]], shape=(3, 5), dtype=float32)
tf.Tensor(
[[-0.e+00 -0.e+00 -1.e+09 -1.e+09 -0.e+00]
[-0.e+00 -0.e+00 -0.e+00 -1.e+09 -1.e+09]
[-1.e+09 -1.e+09 -1.e+09 -0.e+00 -0.e+00]], shape=(3, 5), dtype=float32)
It doesn’t make sense that broadcasting is desired. It changed the shape of the output.
If I remove the axis insertion the result is the much more sensical:
tf.Tensor(
[[7. 6. 0. 0. 1.]
[1. 2. 3. 0. 0.]
[0. 0. 0. 4. 5.]], shape=(3, 5), dtype=float32)
tf.Tensor(
[[-0.e+00 -0.e+00 -1.e+09 -1.e+09 -0.e+00]
[-0.e+00 -0.e+00 -0.e+00 -1.e+09 -1.e+09]
[-1.e+09 -1.e+09 -1.e+09 -0.e+00 -0.e+00]], shape=(3, 5), dtype=float32)
Here you can see the negative infinities occur at just the spots corresponding to zeros in x.
Perhaps the new dimension is need later in the notebook. I haven’t gotten that far. But at this section:
print(tf.keras.activations.softmax(x))
print(tf.keras.activations.softmax(x + (1 - create_padding_mask(x)) * -1.0e9))
It doesn’t make sense.
2 Likes