Q-1: What is the purpose of this line?
lr_schedule=trax.lr.warmup_and_rsqrt_decay(400, 0.01)
I tried searching on Trax documentation but was unable to understand!
Q-2: Why are we equalling the input text length, in each batch?
In week 2 assignment we gave the max_length
parameter to data_generator
which applied to all the batches. And yeah, it makes sense to use same length for all training, validation, and test.
But in weeks 3 and 4 each batch contains different length input. Why?
Even more I don’t understand, why on the first place we are actually padding? RNNs are the way to deal with data whose length is not fixed! In simple neural network input size is fixed so to solve the problem we are using RNN. But now I see padding everywhere! Don’t know why we are doing this? Plus padding data to string changes the meaning of input. RNNs don’t know that padded 0s are just for equalling the input and I think padded bits also changes weights and biases!
Q-3: Why does data_generator from week 2 yield two Xs?
yield batch_np_arr, batch_np_arr, mask_np_arr
[I asked other questions also regarding Trax. If you can answer those as well plz do check out]
(Creating a GRU model using Trax)