Does only transformer need padding using max_length?

Since LSTM can handle variable lengths of inputs, it doesn’t need padding to make an equal length of input. It needs equal lengths of input only when batch training is used, since batch needs equal lengths. This requirement of equal length is different from that of transformer, where the model itself requires equal length of input. Is this right?

Or generally, whenever batch (batch_size>1) is used in training, the padding must be used regardless of the model type. Right or wrong?

NNs use batch as the 0th dimension in tensorflow. When batch size is greater than 1, padding is required during training and inference.

When it comes to LSTMs, input has shape (batch_size, sequence_length, num_features_per_timestep). As far as transformers is concerned, the model input is (batch_size, sequence_length). The last dimension is created via the embedding layer before input to the encoder / decoder.

It’s sufficient to pad a batch to the longest length of the sequence within that batch. This is helpful when your dataset length is skewed towards very few sentences having long lengths as it saves power and since the sequence length is shorter, backprop might be more effective in case of LSTMs.

1 Like

So, when saying that ‘LSTM can deal with inputs of variable length’, does it mean no batch will be used? Because whenever batch is used, the length must be the same.

But the equal length is only required within one batch, so different batches can still allow for different lengths. But in code, it seems there is alway only one max_length variable, and there are no different max_lengths for different batches.

I haven’t come across a NN where batch dimension wasn’t required. When people say that there is no batch, it means that batch size is 1. That said, do check with the model vendor if their custom model / library doesn’t use a batch construct.

It’s common to pad the entire dataset to the maximum length of a single row. This works well for smaller problems and when you have sufficient GPU memory.
You’ll stumble across OutOfMemory issues when the GPU doesn’t have sufficient memory. Common tricks involve changing the batch size to a smaller value for lengthy inputs and pad the batch to the maximum length of the batch.

The assignment is meant to give you a flavor of LSTM and not be an exhaustive tutorial on it.

Please see an example below where the same NN is used with inputs whose batch size and sequence length dimensions are different across both inputs:

import tensorflow as tf
from keras import layers

FEATURES_PER_TIMESTEP = 10
model = tf.keras.Sequential([
    layers.LSTM(input_shape=(None, FEATURES_PER_TIMESTEP), units=32),
    layers.Dense(1)
])

# batch size = 32
# sequence length = 10
inputs = tf.random.uniform((32, 10, FEATURES_PER_TIMESTEP))
outputs = model(inputs)
print(outputs.shape) # (32, 1)

# batch size = 2
# sequence length = 5
inputs = tf.random.uniform((2, 5, FEATURES_PER_TIMESTEP))
outputs = model(inputs)
print(outputs.shape) # (2, 1)

But in your example code, does your second inputs overwrite the first inputs for the two calls, or they co-exist?

Models don’t hold on to inputs across calls to__call__.

Could you be more specific on your last comment? Thanks @balaji.ambresh

Once the inputs are used to generate the output, they can be reassigned to different values. The model doesn’t need to keep track of inputs. It only cares about the internal state (i.e. parameters). So, the example is valid.

Invoking model(inputs) calls __call__ method of the model. Please brush up on python on how a call to an object is resolved.

Thanks for the explanation.