Max_len different for each batch in Siamese network assignment

Mohammad_Atif_Khan · November 22, 2022, 2:56am

I’m trying to understand why is it not an issue to have samples (questions) with different size for each batch in the Siamese networks assignment.

This is from the notebook:

if len(input1) == batch_size, determine max_len as the longest question in input1 and input2. Ceil max_len to a power of 2 (for computation purposes) using the following command: max_len = 2**int(np.ceil(np.log2(max_len))).

max_len in every batch may not be the same doing the above, and we will have different dimensions across batches and my understanding was that we’d want the same size input for every batch fed into the network - is that not correct?

arvyzukai · November 22, 2022, 7:52am

Hi @Mohammad_Atif_Khan

There is no reason to have all dimensions of mini-batches to be the same size compared to each other. Because we update the model weights the same way (inner dimensions of the mini-batch do not matter).

On the other hand every input in each batch has to be the same length. This is because the length of each row or column (or any other dimension) cannot vary in matrices (they could vary in lists but not in matrices). The matrix (2, 3, 10) cannot be the combination of (1, 3, 5) and (1, 3, 7) - you have to “pad” the last dimension to get (2, 3, 10)

Mohammad_Atif_Khan · November 25, 2022, 2:08am

Hi @arvyzukai ,
so in essence, we are saying that an LSTM can process variable sized input streams (sentences in this case) but we only pad due to the limitation imposed by the data structures (Matrices) and parallel computing?

arvyzukai · November 25, 2022, 6:30am

Hi @Mohammad_Atif_Khan

we only pad due to the limitation imposed by the data structures (Matrices) and parallel computing?

That is correct.

We require a trick of “padding” because of mini-batches. If we would be training by SGD alone (one sentence at a time) we would not need padding (we could process sentence of any length) but training would not be efficient (SGD would update the model weights one sentence at a time and that is shown to be very slow and sometimes convergence would not be possible). If you are interested to learn more I would recommend (SGD and mini-batch SGD)

Topic		Replies	Views
C3_W4 Assignment: Padding in excercise 2 NLP with Sequence Models week-module-4	6	530	March 6, 2023
Does only transformer need padding using max_length? Sequence Models coursera-platform	8	920	March 8, 2023
DLS 5 - Input/output of varying window sizes Sequence Models coursera-platform	7	538	June 8, 2022
C3_W3_Lab_2_multiple_layer_LSTM Natural Language Processing in TensorFlow	1	343	October 14, 2022
Course 5: Sequence models - Handling the padding? Sequence Models coursera-platform	1	576	June 27, 2021

Max_len different for each batch in Siamese network assignment

Related topics