C5_W4_A1_Transformer_Subclass_v1 DecoderLayer Class why not "use_causal_mask"

Maggie_Zhang3 · July 18, 2024, 1:01am

In DecoderLayer class excercise, when calculating mult_attn_out1, attn_weights_block1, why not seeing “use_causal_mask=True” in the call argument for look ahead mask?

Deepti_Prasad · July 18, 2024, 6:42am

Hi @Maggie_Zhang3

use_causal_mask : A boolean to indicate whether to apply a causal mask to prevent tokens from attending to future tokens

if you check the decoder layer call function, one of the argument mentions
look_ahead_mask – Boolean mask for the target_input

A look-ahead mask is required to prevent the decoder from attending to succeeding words, such that the prediction for a particular word can only depend on known outputs for the words that come before it.

So as we are using look_ahead_mask to the attention block1, use_casual_mask=True is not required to be used here.

Topic		Replies	Views
C4W2 Question about Decoder self-attention layer masks NLP with Sequence Models week-2	4	186	April 29, 2024
C4W2-Assignment Block 1 of DecoderLayer NLP with Attention Models week-2	1	23	April 9, 2025
Masked Attention Transformers Sequence Models	6	795	September 27, 2024
C5_W4_A1_Transformer_Subclass_v1: Why is look_ahead mask used before the padding_mask Sequence Models	6	588	November 1, 2023
Parallelism At Decoder Layer In Transformers Sequence Models	6	639	June 24, 2023

C5_W4_A1_Transformer_Subclass_v1 DecoderLayer Class why not "use_causal_mask"

Related topics