Hello, I am unable to understand attn_output
operation in exercise 4. The instructions say that I have to pass Q, K, V and mask with default value of return_attention_scores
. I cannot comprehend what value it means. If it is related to scaled_dot_product_attention
function then how can we use the function along with mha as suggested in the comment right above the code?
1 Like
I recommend you check out the API doc for the MultiHeadAttention call signature:
return_attention_scores is mentioned here.