Choosing a Batch Size for Input and Model.fit() in an LSTM model

Classroom item of discussion: Batch Sizing
Hello there I am trying to build a LSTM model that uses leave-out-one trial validation/testing. I have 10 total trials and I want to train the model on 9 of these trials and test/ predict using the 10th trial. The training data includes 9 trials of time-series data with an input shape = (9 trails/batches, 540 time steps for each trial, and 63 features).

I want to train the model on the data from all 9 trials, but preserve the time series nature of the data. I am trying to understand what my batch size should be when training the LSTM model? If I have a batch size = 1 in my model.fit() function is that training all the data at once, or is it is training each trial/batch at a time? Is the batch_size in the model.fit() supposed to be the same as the batch size of the data? So in my case, is the batch size supposed to be 9 in order to train each trial at once? I am trying to understand what the batch size in the input vs batch size in the model.fit() function each do, and how to chose the appropriate training batch size my application.

Tensorflow takes care of vectorization across layers. So, the model batch size size is equal to the batch size across LSTM layer.Model batch size should be 9 in your case.

Do be mindful of the input shape toe the lstm layer which is of the form (batch size, # timesteps, #features per timestep). In tensorflow, specifying LSTM input shape as [None, 63] is sufficient since unless return_sequences is set to True, only the output of the last timestep is returned.

1 Like

Hi there, thank you so much for your response! If I have been using a mini batch size of 1 in my model.fit() function what does that mean? Is it training on all 9 trials of data at once instead of 1 trial at a time? I truly appreciate all your help!

hi @museck

Also can I know are these training clinical trials?

Also is the reason of using mini batch size of 1 is stochastic gradient?

mini batch size is a subset of your batch size, so if your mini batch size is 1 and batch size is 9 (not trials but batch size) then

With a mini-batch size of 1 and a batch size of 9, the number of training samples is 9 (because the batch size is the number of samples in each batch). If the mini-batch size is 1, then each batch consists of 9 mini-batches (9/1=9), and therefore the batch size of 9 is the number of samples in a batch.

For each epoch training is a complete training of the complete 9 trials run over the dataset. if you are doubt is more about if your 9 trails are simultaneously training over the batch training then it depends on your input shape which seems to do so in your case.

1 Like

You’re welcome.

Let’s get some lingo out of the way.

Consider the case where you have 100 data points for training. This means that the training batch size is 100. Mini batch refers to subset of training data input to the model per training step. When mini batch size is the same as size of entire training data, it’s called batch gradient descent. Consider taking machine learning specialization offered by deeplearning.ai if you’re curious about the benefits of training a model using mini batch gradient descent instead of batch gradient descent.

If we have a mini batch size as 32, the entire training data can be exhausted in 4 steps (3 mini batches of 32 data points each and 1 mini batch of 4 data points). Finally, an epoch refers to a complete pass on the entire dataset. In our case, there’ll be 4 steps per epoch.

In tensorflow, mini batch size is referred to as batch_size.

Back to your problem:

Here are 2 ways to solve your problem:

Viewpoint 1:

  1. Each mini batch fed to the model has 9 trials worth of data, where each trial is of shape [540, 63]. Model input shape is [9, 540, 63]
  2. For each input, the model has to predict output of shape [540, 63] i.e. features for the 10th timestep. The final output shape is [9, 540, 63]

With this approach, the model is learning to predict [540, 63] data points based on 1 trial.

Viewpoint 2 (seems like this is what you want):

  1. Each row within a mini batch consists 540 time steps, where each timestep has shape [9, 63]. Model input shape is [batch_size, 540, 9, 63]
  2. The model is expected to predict 1 timestep into the future (i.e. for the 10th trial) of shape [batch_size, 540, 63]

In this case, if mini batch size is set to 1, input shape is [1, 540, 9, 63] and output shape is [1, 540, 63].