Does only transformer need padding using max_length?

Martinmin · March 7, 2023, 8:25pm

Since LSTM can handle variable lengths of inputs, it doesn’t need padding to make an equal length of input. It needs equal lengths of input only when batch training is used, since batch needs equal lengths. This requirement of equal length is different from that of transformer, where the model itself requires equal length of input. Is this right?

Or generally, whenever batch (batch_size>1) is used in training, the padding must be used regardless of the model type. Right or wrong?

balaji.ambresh · March 8, 2023, 5:53am

NNs use batch as the 0th dimension in tensorflow. When batch size is greater than 1, padding is required during training and inference.

When it comes to LSTMs, input has shape (batch_size, sequence_length, num_features_per_timestep). As far as transformers is concerned, the model input is (batch_size, sequence_length). The last dimension is created via the embedding layer before input to the encoder / decoder.

It’s sufficient to pad a batch to the longest length of the sequence within that batch. This is helpful when your dataset length is skewed towards very few sentences having long lengths as it saves power and since the sequence length is shorter, backprop might be more effective in case of LSTMs.

Martinmin · March 8, 2023, 6:58am

So, when saying that ‘LSTM can deal with inputs of variable length’, does it mean no batch will be used? Because whenever batch is used, the length must be the same.

But the equal length is only required within one batch, so different batches can still allow for different lengths. But in code, it seems there is alway only one max_length variable, and there are no different max_lengths for different batches.

balaji.ambresh · March 8, 2023, 8:10am

I haven’t come across a NN where batch dimension wasn’t required. When people say that there is no batch, it means that batch size is 1. That said, do check with the model vendor if their custom model / library doesn’t use a batch construct.

It’s common to pad the entire dataset to the maximum length of a single row. This works well for smaller problems and when you have sufficient GPU memory.
You’ll stumble across OutOfMemory issues when the GPU doesn’t have sufficient memory. Common tricks involve changing the batch size to a smaller value for lengthy inputs and pad the batch to the maximum length of the batch.

The assignment is meant to give you a flavor of LSTM and not be an exhaustive tutorial on it.

Please see an example below where the same NN is used with inputs whose batch size and sequence length dimensions are different across both inputs:

import tensorflow as tf
from keras import layers

FEATURES_PER_TIMESTEP = 10
model = tf.keras.Sequential([
    layers.LSTM(input_shape=(None, FEATURES_PER_TIMESTEP), units=32),
    layers.Dense(1)
])

# batch size = 32
# sequence length = 10
inputs = tf.random.uniform((32, 10, FEATURES_PER_TIMESTEP))
outputs = model(inputs)
print(outputs.shape) # (32, 1)

# batch size = 2
# sequence length = 5
inputs = tf.random.uniform((2, 5, FEATURES_PER_TIMESTEP))
outputs = model(inputs)
print(outputs.shape) # (2, 1)

Martinmin · March 8, 2023, 7:24pm

But in your example code, does your second inputs overwrite the first inputs for the two calls, or they co-exist?

balaji.ambresh · March 8, 2023, 8:01pm

Models don’t hold on to inputs across calls to__call__.

Martinmin · March 8, 2023, 8:15pm

Could you be more specific on your last comment? Thanks @balaji.ambresh

balaji.ambresh · March 8, 2023, 8:22pm

Once the inputs are used to generate the output, they can be reassigned to different values. The model doesn’t need to keep track of inputs. It only cares about the internal state (i.e. parameters). So, the example is valid.

Invoking model(inputs) calls __call__ method of the model. Please brush up on python on how a call to an object is resolved.

Martinmin · March 8, 2023, 8:31pm

Thanks for the explanation.

Topic		Replies	Views
Max_len different for each batch in Siamese network assignment NLP with Sequence Models week-4	3	533	November 25, 2022
Questions regrading NLP course 3 NLP with Sequence Models week-4	1	604	July 26, 2022
C3_W3_Lab_2_multiple_layer_LSTM Natural Language Processing in TensorFlow	1	331	October 14, 2022
DLS 5 - Input/output of varying window sizes Sequence Models	7	531	June 8, 2022
C3_W4 Assignment: Padding in excercise 2 NLP with Sequence Models week-4	6	511	March 6, 2023

Does only transformer need padding using max_length?

Related topics