C5_W4_A1_Transformer_Subclass_v1 DecoderLayer Class why not "use_causal_mask"

In DecoderLayer class excercise, when calculating mult_attn_out1, attn_weights_block1, why not seeing “use_causal_mask=True” in the call argument for look ahead mask?

Hi @Maggie_Zhang3

use_causal_mask : A boolean to indicate whether to apply a causal mask to prevent tokens from attending to future tokens

if you check the decoder layer call function, one of the argument mentions
look_ahead_mask – Boolean mask for the target_input

A look-ahead mask is required to prevent the decoder from attending to succeeding words, such that the prediction for a particular word can only depend on known outputs for the words that come before it.

So as we are using look_ahead_mask to the attention block1, use_casual_mask=True is not required to be used here.