Create_padding_mask() function

shaya_kahn · August 16, 2024, 5:15pm

Hey,

In the example in the notebook:

x = tf.constant([[7., 6., 0., 0., 1.], [1., 2., 3., 0., 0.], [0., 0., 0., 4., 5.]])
tf.keras.activations.softmax(x + (1 - create_padding_mask(x)) * -1.0e9)

The result is a tensor (let’s call it T) of shape (3, 3, 5). Why this tensor did not satisfy T[:, 0, :] = T[:, 1, :] = T[:, 2, :].?

paulinpaloalto · August 16, 2024, 6:40pm

Note that they add the new dimension as the second dimension, not the first dimension. That makes the shape of the result (3, 1, 5), as you see in the output:

x = tf.constant([[7., 6., 0., 0., 1.], [1., 2., 3., 0., 0.], [0., 0., 0., 4., 5.]])
print(create_padding_mask(x))
tf.Tensor(
[[[1. 1. 0. 0. 1.]]

 [[1. 1. 1. 0. 0.]]

 [[0. 0. 0. 1. 1.]]], shape=(3, 1, 5), dtype=float32)

The individual rows are different because the inputs are different, right?

shaya_kahn · August 16, 2024, 8:16pm

In the code I provided, a tensor of shape (3,5) was added to a tensor of shape (3,1,5). The resulting tensor should be (3,3,5) because of broadcasting. I actually tried to ask why the resulted tensor (let’s call it T) did not satisfy T[:, 0, :] = T[:, 1, :]. But now I understand why…

nadtriana · August 16, 2024, 8:50pm

The reason T[:, 0, :] differs from T[:, 1, :] and from T[:, 2, :] is that the broadcasting mechanism interacts differently with the added tensors. In the tensor x, each row has distinct values, so when added to the masked tensor, each row ends up with a distinct pattern before the softmax is applied. The masked tensor, the large negative values introduced by (1 - create_padding_mask(x)) * -1.0e9, effectively “zero out” different parts of x in each row, depending on the mask. Even though the masked values might be the same, the non-masked values in each row of x differ, leading to distinct softmax outputs. Thus, after adding x to the broadcasted tensor and applying softmax, each slice along the first dimension of T represents a softmax of a slightly different set of numbers, resulting in different distributions.

Topic		Replies	Views
C5_W4 Masking issue (?!) Sequence Models week-4	2	136	May 16, 2024
C5w4 2.1 Padding mask Sequence Models week-4	9	288	March 9, 2024
Padding mask dimmensions Sequence Models	5	504	May 4, 2023
Course5_week4 Size of mask after softmax Sequence Models	6	688	March 15, 2025
Why does applying the padding mask change the tensor's shape C5W4Asn1 Sequence Models	2	555	January 21, 2023

Create_padding_mask() function

Related topics