I made the code above,and get the following results.

It seems to me that something is wrong with my code,and as a result,the attention weights is different from the right result,but I can’t tell where is the wrong point in my code. Please help.

I recommend you read the instructions for Exercise 3 carefully for how to apply the mask.

For the attention_weights, I recommend you use the tf.keras.activations.softmax(…) function.

For output, you should compute the matrix product of attention_weights and v.

Thank you for your suggestion.