In 2.1 - Padding Mask the assiement shows us the difference between computing softmax directly and computing softmax with the addition of negative infinity.
print(tf.keras.activations.softmax(x))
print(tf.keras.activations.softmax(x + create_padding_mask(x) * -1.0e9))
The second line of code results in a shape problem , the matrix A cannot be added to B
c,It should be changed to the following:
print(tf.keras.activations.softmax(x[:, tf.newaxis, tf.newaxis, :] + create_padding_mask(x) * -1.0e9))