Assignment - Mask padding before training

Aaditya1 · May 13, 2023, 9:34am

Why do we have to mask pad the loss weights of our data, by using the id_to_mask argument of [trax.data.inputs.add_loss_weights] ? We didn’t do this previously.

Create training data, mask pad id=35180 for training.

train_generator = trax.data.inputs.add_loss_weights(
data_generator(batch_size, t_sentences, t_labels, vocab[‘’], True),
id_to_mask=vocab[‘’])

Create validation data, mask pad id=35180 for training.

eval_generator = trax.data.inputs.add_loss_weights(
data_generator(batch_size, v_sentences, v_labels, vocab[‘’], True),
id_to_mask=vocab[‘’])

Regards,

gent.spah · May 16, 2023, 7:42am

From my understanding (I recently have gotten access to NLP specialization too), you need to pad mask the data because you need to have a fixed length of inputs and outputs, so beyond the size of the sentence, if the sentence is shorter than the fixed length, you pad with the mask character.

When the times comes to calculate the loss those pads will be the same and they wont contribute to the overall loss.

arvyzukai · May 16, 2023, 8:10am

Hi @gent.spah

That is true. But the question of @Aaditya1 was why it was not done in the previous week.

I had no time to check, but the probable cause could be that the padding token id in the previous week was 0 and maybe (this needs to be checked) by default the training task (somewhere down the line) might assign these to 0 loss weights (even though I doubt it by just glancing over the trax implementation - the default add_weight_loss is None - no loss weights means the model gets penalized for not predicting correctly pad tokens and probably it can get away with this).

So, it could be just the mistake that we can get away with or the default value of 0 padding is accounted somewhere in the trax.

Aaditya1 · May 16, 2023, 1:13pm

Hi @arvyzukai,

What do you mean by ’ no loss weights means the model gets penalized for not predicting correctly pad tokens and probably it can get away with this).’ Can you explain how this penalization is related to loss weights?

arvyzukai · May 16, 2023, 4:18pm

Hi @Aaditya1

Let me explain with example from the previous week assignment. Here is a simple batch sample of 2:

If the loss is not “masked out” according to weights ([1, 1, 1, 0, 0, 0,…0]) then the model has to predict each 0 correctly. It’s not a very hard task (it’s easy to predict, that after 1, it’s always 0s) so that is why I say the ~~model~~ training can get away with this.

But the correct implementation should account for padded tokens and would not accumulate the loss where the mask is 0s.

In other words, the training updates model’s weights indistinguishably be it the token is 49 or 0 (but in reality we care more about 49 or 50 than 0s).

P.S. to check if this is true takes some time, and when I get the time (or someone else) then I could answer if this is the case - (if loss is calculated for padded tokens).

Aaditya1 · May 17, 2023, 9:29am

Thanks a lot @arvyzukai , I got an intuition of what you are trying to explain, please let me know whenever you check for the case of loss being calculated for padded tokens.

Topic		Replies	Views
AttributeError: module 'trax.data' has no attribute 'inputs' NLP with Sequence Models week-3	3	549	January 26, 2023
[C4_W4_Assignment] what does trax.data.AddLossWeights do? NLP with Attention Models week-4	1	403	June 27, 2023
Questions regrading NLP course 3 NLP with Sequence Models week-4	1	604	July 26, 2022
W2 - Creating the mask for masked_accuracy Exercise 5 NLP with Sequence Models week-2	5	547	March 1, 2024
C3W2: Exercise 2,5 and error NLP with Sequence Models week-2	6	774	August 22, 2024

Assignment - Mask padding before training

Create training data, mask pad id=35180 for training.

Create validation data, mask pad id=35180 for training.

Related topics