A112
August 20, 2022, 11:25pm
1
Hello everyone!

Notice that when the padding mask gets applied, the shape changes. (in between exercises 2 and 3 of C5W5Asn1)
That seems incorrect to me. If I’m mistaken, please do explain!!

```
# Softmax original sequence
tf.keras.activations.softmax(
x
).numpy().shape
```

out–> `(3, 5)`

```
# Softmax masked sequence
tf.keras.activations.softmax(
x + (1 - create_padding_mask(x)) * -1.0e9
).numpy().shape
```

out → `(3,3, 5)`

TMosh
August 24, 2022, 5:22am
2
The reason there is an additional axis is because create_padding_mask() includes a newaxis parameter in the return values.

I have the same question. I see that a new axis was inserted at the end of

```
def create_padding_mask(decoder_token_ids):
```

But why was it added?

If I print using the following

```
print(x)
print((1 - create_padding_mask(x)) * -1.0e9)
```

The output is the following:

```
tf.Tensor(
[[7. 6. 0. 0. 1.]
[1. 2. 3. 0. 0.]
[0. 0. 0. 4. 5.]], shape=(3, 5), dtype=float32)
tf.Tensor(
[[-0.e+00 -0.e+00 -1.e+09 -1.e+09 -0.e+00]
[-0.e+00 -0.e+00 -0.e+00 -1.e+09 -1.e+09]
[-1.e+09 -1.e+09 -1.e+09 -0.e+00 -0.e+00]], shape=(3, 5), dtype=float32)
```

It doesn’t make sense that broadcasting is desired. It changed the shape of the output.

If I remove the axis insertion the result is the much more sensical:

```
tf.Tensor(
[[7. 6. 0. 0. 1.]
[1. 2. 3. 0. 0.]
[0. 0. 0. 4. 5.]], shape=(3, 5), dtype=float32)
tf.Tensor(
[[-0.e+00 -0.e+00 -1.e+09 -1.e+09 -0.e+00]
[-0.e+00 -0.e+00 -0.e+00 -1.e+09 -1.e+09]
[-1.e+09 -1.e+09 -1.e+09 -0.e+00 -0.e+00]], shape=(3, 5), dtype=float32)
```

Here you can see the negative infinities occur at just the spots corresponding to zeros in x.

Perhaps the new dimension is need later in the notebook. I haven’t gotten that far. But at this section:

```
print(tf.keras.activations.softmax(x))
print(tf.keras.activations.softmax(x + (1 - create_padding_mask(x)) * -1.0e9))
```

It doesn’t make sense.

1 Like