I am getting quite confused with the the implementation of Ex3. In the implementation of this method, the “wrong masked weights” assertion error is thrown:

Thanks for the reply
I am using Tensorflow functions:

tf.linalg.matmul

tf.cast(), tf.shape()

tf.math.sqrt()

tf.nn.softmax() – also tried tf.keras.layers.Softmax(axis)(logits) with same result

I have also tried restarting the kernel just in case.
My method is basically the same as that presented in the TensorFlow tutorial for transformer model

The output attention weights from my method implementation is:

The tf.keras.activations.softmax() also resulted in the same output.

Interestingly, I didn’t get a syntax error with tf.keras.layers.Softmax(axis)(logits), but nevertheless tried tf.keras.layers.Softmax()(logits) (since axis = -1 is default anyways).

Having no idea what to try anymore, I tried copy-pasting the code parts of the TensorFlow Tutorial, but the output from the method is still exactly the same.

Yes, that is what I have done for the method you have suggested. What I have at currently: attention_weights = tf.keras.activations.softmax(scaled_attention_logits)
This still results in the assertion error due to the output still being

Please note: providing the inputs outside is shown in the documentation for tf.keras.layers.Softmax():