If you get an error from the tf.keras.activations.softmax() function about “tuple object has no attribute rank”, then be sure you used tf.matmul() when computing Q*K’. and not np.matmul().
The functions do the same thing, but tf.matmul() returns a tensor, which is what the Keras softmax() function requires.