Hi, in the final assignment DecoderLayer
:
# UNQ_C6 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION DecoderLayer
The first block of the call function is
# BLOCK 1
# calculate self-attention and return attention scores as attn_weights_block1.
# Dropout will be applied during training (~1 line).
mult_attn_out1, attn_weights_block1 = self.mha1(None, None, None, None, return_attention_scores=True)
However, the keras page says there’s another parameter training
after return_attention_scores
. Should we include that as well? I think it should be required since we have dropout in it. Thanks!