Questions regrading NLP course 3

Aayush_Jariwala · July 22, 2022, 2:11pm

Q-1: What is the purpose of this line?
lr_schedule=trax.lr.warmup_and_rsqrt_decay(400, 0.01)
I tried searching on Trax documentation but was unable to understand!

Q-2: Why are we equalling the input text length, in each batch?
In week 2 assignment we gave the max_length parameter to data_generator which applied to all the batches. And yeah, it makes sense to use same length for all training, validation, and test.

But in weeks 3 and 4 each batch contains different length input. Why?

Even more I don’t understand, why on the first place we are actually padding? RNNs are the way to deal with data whose length is not fixed! In simple neural network input size is fixed so to solve the problem we are using RNN. But now I see padding everywhere! Don’t know why we are doing this? Plus padding data to string changes the meaning of input. RNNs don’t know that padded 0s are just for equalling the input and I think padded bits also changes weights and biases!

Q-3: Why does data_generator from week 2 yield two Xs?
yield batch_np_arr, batch_np_arr, mask_np_arr

[I asked other questions also regarding Trax. If you can answer those as well plz do check out]
(Creating a GRU model using Trax)

arvyzukai · July 26, 2022, 8:40am

Q1 There is an expanation about learning rate warmup (Section: 12.11.3.4. Warmup) :
https://d2l.ai/chapter_optimization/lr-scheduler.html#warmup

Q2: Because we are doing matrix multiplication and when you batch the data you have to have same length, otherwise you cannot dot multiply. If you would be going by a single example (input not batched) one input at a time you would not need to pad the sequence. The learning is faster with mini-batches.

RNN “knows” where inputs are padded with a help of the mask (mask_np_arr tells the model where the (padded) predictions do not matter and the model weights are not updated accordingly.)

Q3: The batch is a tuple of three parts: inputs, targets, mask. The inputs and targets are identical. The second column will be used to evaluate your predictions. Mask is 1 for non-padding tokens.

Topic		Replies	Views
C3_W4 Assignment: Padding in excercise 2 NLP with Sequence Models week-4	6	511	March 6, 2023
Does only transformer need padding using max_length? Sequence Models	8	836	March 8, 2023
Assignment - Mask padding before training NLP with Sequence Models week-3	5	538	May 17, 2023
Week1 building RNN step by step assignment - questions about input data dimension Sequence Models	7	657	July 6, 2021
Max_len different for each batch in Siamese network assignment NLP with Sequence Models week-4	3	533	November 25, 2022

Questions regrading NLP course 3

Related topics