Hey,

In the example in the notebook:

x = tf.constant([[7., 6., 0., 0., 1.], [1., 2., 3., 0., 0.], [0., 0., 0., 4., 5.]])
tf.keras.activations.softmax(x + (1 - create_padding_mask(x)) * -1.0e9)

The result is a tensor (let’s call it T) of shape (3, 3, 5). Why this tensor did not satisfy T[:, 0, :] = T[:, 1, :] = T[:, 2, :].?

Note that they add the new dimension as the second dimension, not the first dimension. That makes the shape of the result (3, 1, 5), as you see in the output:

``````x = tf.constant([[7., 6., 0., 0., 1.], [1., 2., 3., 0., 0.], [0., 0., 0., 4., 5.]])
tf.Tensor(
[[[1. 1. 0. 0. 1.]]

[[1. 1. 1. 0. 0.]]

[[0. 0. 0. 1. 1.]]], shape=(3, 1, 5), dtype=float32)
``````

The individual rows are different because the inputs are different, right?

In the code I provided, a tensor of shape (3,5) was added to a tensor of shape (3,1,5). The resulting tensor should be (3,3,5) because of broadcasting. I actually tried to ask why the resulted tensor (let’s call it T) did not satisfy T[:, 0, :] = T[:, 1, :]. But now I understand why…