C5 Week 4, Transformer subclass v1

Hello, I am unable to understand attn_output operation in exercise 4. The instructions say that I have to pass Q, K, V and mask with default value of return_attention_scores. I cannot comprehend what value it means. If it is related to scaled_dot_product_attention function then how can we use the function along with mha as suggested in the comment right above the code?

1 Like

I recommend you check out the API doc for the MultiHeadAttention call signature:

return_attention_scores is mentioned here.