Transformer Decoder Mask Input

YIHUI · August 11, 2022, 1:31am

Since the input is like the following for BERT model (transformer decoder):

input example: Thank you me to your party week
output example: inviting this

Should we add train_mask to this training data: with mask = 1 for only those masked words and 0 else, so that when calculating loss we only care about the loss of the predicted masked words?

arvyzukai · August 12, 2022, 4:58pm

I’m not sure I understand your question. Could you elaborate more? Which model (assignment or…) are you talking about? Which mask?

Topic		Replies	Views
BERT pretraining NLP with Attention Models week-module-3	1	347	February 6, 2024
Few doubts regarding the pre-training and working of t5 transformers NLP with Attention Models week-module-3	2	334	November 9, 2023
Confusion regarding the video on BERT Objective NLP with Attention Models week-module-3	2	367	September 4, 2023
Transformer decoder architecture in course 2 NLP with Attention Models week-module-2	11	513	April 30, 2024
#C4W2 - Exercise 4 Transformer error NLP with Attention Models week-module-2	7	301	February 20, 2024

Transformer Decoder Mask Input

Related topics