C5 Week 4, Transformer subclass v1

nikhil_aryal · November 17, 2021, 11:28am

Hello, I am unable to understand attn_output operation in exercise 4. The instructions say that I have to pass Q, K, V and mask with default value of return_attention_scores. I cannot comprehend what value it means. If it is related to scaled_dot_product_attention function then how can we use the function along with mha as suggested in the comment right above the code?

jonaslalin · November 17, 2021, 3:35pm

I recommend you check out the API doc for the MultiHeadAttention call signature:

return_attention_scores is mentioned here.

Topic		Replies	Views
DLS 5 Week 4 Ex 4 Sequence Models coursera-platform	1	537	December 12, 2021
C5 - W4 - Transformers Architecture, 3rd June 2021 version Sequence Models coursera-platform	2	743	June 9, 2021
C5 W4 A1: Question about MultiHeadAttention Sequence Models coursera-platform	2	739	August 4, 2021
C5W4A1 Exercise 3 - scaled_dot_product_attention Sequence Models coursera-platform	5	1208	July 12, 2021
Programming Assignment: Transformers Architecture with TensorFlow encoderlayer Sequence Models week-module-4 , coursera-platform	2	399	January 23, 2024

C5 Week 4, Transformer subclass v1

Related topics