Transformer assignment MultiHeadAttention call

Yu_Hou · March 21, 2022, 10:24am

Hi, in the final assignment DecoderLayer:

# UNQ_C6 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION DecoderLayer

The first block of the call function is

# BLOCK 1
# calculate self-attention and return attention scores as attn_weights_block1.
# Dropout will be applied during training (~1 line).
mult_attn_out1, attn_weights_block1 = self.mha1(None, None, None, None, return_attention_scores=True)

However, the keras page says there’s another parameter training after return_attention_scores. Should we include that as well? I think it should be required since we have dropout in it. Thanks!

TMosh · March 21, 2022, 7:58pm

We’re using the default values for the other parameters. You don’t need to specify them.

Topic		Replies	Views
C5 - W4 - Transformers Architecture, 3rd June 2021 version Sequence Models coursera-platform	2	743	June 9, 2021
C5_W4_A1_Transformer_Subclass_v1 - class DecoderLayer Sequence Models coursera-platform	4	998	August 23, 2021
C5_W4_A1_UNQ_C6 Decoder Layer Sequence Models coursera-platform	4	835	August 4, 2021
C5_W4_A1_exercise6 Sequence Models week-4 , coursera-platform	2	139	May 28, 2024
C5 W4 A1: Question about MultiHeadAttention Sequence Models coursera-platform	2	739	August 4, 2021

Transformer assignment MultiHeadAttention call

Related topics