NLP C4_W1 Assignment: tf.Select ignores mask in train_batch_stream, mistake or intentional?

Abhishek_Dhankar · May 10, 2022, 5:45am

In the function “def NMTAttn”, we have the tl.Serial function in which we have the following line:

# Step 2: copy input tokens and target tokens as they will be needed later

Following that hint/instruction we use tl.Select function in a way that ignores or deletes the “mask weight”, which was added to the input stream by “trax.data.AddLossWeights(id_to_mask=0)(train_batch_stream)”.

After deleting that from the input stream we add the same mask (1s for non pad tokens and 0s for pad tokens) via the function “prepare_attention_input” later on in the tl.Series function.

Two questions:

Am I right is assuming that we are actually deleting the mask weights and adding the same weights later in the tl.Series?
If so, then is that just a contrivance to impart more information about Trax or is there an implementation reason why the mask is deleted/ignored?

reinoudbosch · May 22, 2022, 10:47pm

Hi Abhishek_Dhankar,

The way I understand it is that mask_batch is not selected from train_batch_stream so that the prepare_attention_input function, implemented as an exercise, can work correctly when used in NMTAttn. So I would say that the preparation of the train_batch_stream provides information on how you could work with Trax, whereas the deletion of the mask and later insertion are done so that prepare_attention_input can be presented and implemented as an exercise.

Topic		Replies	Views
The purpose of the Mask NLP with Attention Models week-module-1	3	829	December 28, 2022
[C4_W4_Assignment] what does trax.data.AddLossWeights do? NLP with Attention Models week-module-4	1	415	June 27, 2023
Transformer Encoder tl.Select NLP with Attention Models week-module-3	2	538	August 4, 2022
Transformer Decoder Mask Input NLP with Attention Models week-module-3	1	541	August 12, 2022
BERT pretraining NLP with Attention Models week-module-3	1	354	February 6, 2024

NLP C4_W1 Assignment: tf.Select ignores mask in train_batch_stream, mistake or intentional?

Related topics