The exercise’s hint / comment is :

softmax is normalized on the last axis (seq_len_k) so that the scores add up to 1.

However, I cannot pass this exercise, and suspect that I have got softmax wrong.
Is there any hint / reference manual documentation on how I can get the last axis. I have used index of 0 for scaled_attention_logits but cannot pass the test.

  1. For code line
    Multiply q and k transposed, you have recalled the transpose part incorrectly, you need to mention q and k and then transpose_b=True

  2. for code line, scale matmul_qk with the square root of dk you incorrectly reshaped k, you have used k.shape but you are suppose to use tf.shape(k). your code line is correct except the reshape part, use tf.shape(k)[-1]

  3. To add the mask to the scaled tensor, instructions given are

  • Multiply (1. - mask) by -1e9 before adding it to the scaled attention logits.
    But you have added just mask which is incorrect
    You do not require to take length of scaled_attention_logits, kindly remove it.
    You only require to use the tf.keras.activations.softmax to the scaled_attention_logits without [0]

Thanks! I can pass the test now and can proceed to the next exercise.
Dear Deepti_Prased,
For your information.


