C3_W3_Lab_2_multiple_layer_LSTM

In the dataset that is downloaded. Why do we use padded_batch for the dataset rather than using pad_sequences ?

When using padded_batch, the lengths of the sequences are only padded to the maximum length of the batch (and not the whole dataset). Which means each individual batch will have different sequence lengths.

When using a Bidirectional LSTM for text classification, is it not important how long each individual sequence is? i.e. is the number of iterations of the LSTM Cell not something that’s important?

When using pad_sequences, you are wasting computation resources across small sentences since the LSTM can quickly figure out EOS -> EOS token mapping from current to next timestep.

In order to get around this problem, it’s sufficient to train the lstm till the length of the longest sentence in the batch.

If you have 1 sentence with length 30 and the maximum sentence length of rest of the dataset is 6 and say you have 10000 sentences, training will be faster with padded_batch.