How the mask in the training generator is used?

YIHUI · August 10, 2022, 3:51am

I assume this mask is used when computing cross entropy during training, since we make summary part =1 for the mask, cross entropy will be only computed for the summary part. But I did not find any parts of the scripts indicating this… Or was it already specified in the source code where passing training data (sentence, sentence, mask) to the model？

arvyzukai · August 10, 2022, 8:00am

Yes, you are correct that is not visible in the assignment code. It is being handled inside the TrainTask (in UNQ_C8 you pass train_gen (or train_batch_stream) with this mask)

Topic		Replies	Views
Context Mask during Training How Diffusion Models Work	7	248	September 21, 2023
Understanding Masking NLP with Attention Models week-3	3	497	September 19, 2023
The purpose of the Mask NLP with Attention Models week-1	3	638	December 28, 2022
C3 Assignment 3 E4 Problem with understanding evaluate_prediction NLP with Sequence Models week-3	9	658	November 8, 2023
BERT pretraining NLP with Attention Models week-3	1	344	February 6, 2024

How the mask in the training generator is used?

Related topics