Max_len different for each batch in Siamese network assignment

I’m trying to understand why is it not an issue to have samples (questions) with different size for each batch in the Siamese networks assignment.

This is from the notebook:

  • if len(input1) == batch_size, determine max_len as the longest question in input1 and input2. Ceil max_len to a power of 2 (for computation purposes) using the following command: max_len = 2**int(np.ceil(np.log2(max_len))).

max_len in every batch may not be the same doing the above, and we will have different dimensions across batches and my understanding was that we’d want the same size input for every batch fed into the network - is that not correct?

Hi @Mohammad_Atif_Khan

There is no reason to have all dimensions of mini-batches to be the same size compared to each other. Because we update the model weights the same way (inner dimensions of the mini-batch do not matter).

On the other hand every input in each batch has to be the same length. This is because the length of each row or column (or any other dimension) cannot vary in matrices (they could vary in lists but not in matrices). The matrix (2, 3, 10) cannot be the combination of (1, 3, 5) and (1, 3, 7) - you have to “pad” the last dimension to get (2, 3, 10)

Hi @arvyzukai ,
so in essence, we are saying that an LSTM can process variable sized input streams (sentences in this case) but we only pad due to the limitation imposed by the data structures (Matrices) and parallel computing?

Hi @Mohammad_Atif_Khan

we only pad due to the limitation imposed by the data structures (Matrices) and parallel computing?

That is correct.

We require a trick of “padding” because of mini-batches. If we would be training by SGD alone (one sentence at a time) we would not need padding (we could process sentence of any length) but training would not be efficient (SGD would update the model weights one sentence at a time and that is shown to be very slow and sometimes convergence would not be possible). If you are interested to learn more I would recommend (SGD and mini-batch SGD)

1 Like