Mini-Batches Dataset question

Hi guys,

I have a question about the mini-batches. Does the data present in the different mini-batches have to adhere to certain rules, e.g. certain distribution or can the batches simply be selected at random (random data per batch).

Because if you are for example tackling a computer vision problem, detecting objects/keypoints on objects, I can imagine that you would have images of the objects from different distances, viewpoints, lighting conditions etc. Does each mini-batch then have to contain the same distribution of images, i.e. different distances, viewpoints etc, as to be being a representation of the total dataset?

I hope my question is clear.



Keras fit model, if the data is not coming from a generator, by default, shuffles the data before mini-batching. This provides a better generalization in most cases. The dataset you described seems like a generator model. If that is the case, the “shuffle” parameter will be ignored in the model. Source: Keras Fit

1 Like

I was looking to some information about the ordering into the mini-batch.
Does changing the order and composition of the data within each mini-batch could have an impact on the performance of our algorithme ? could it be seen then as another “hyperparameter” that could be tweaked as part of optimisation ?
Thank you

Hello @atriki,

I believe the data distribution in each mini-batch matters. For example, a randomized mini-batch and a mini-batch of samples that all share the same label value should be very different. This is because the cost surface we are optimizing on in a particular mini-batch DEPENDS on the data of that mini-batch. In other words, the cost surface keeps changing from one mini-batch to another, and gradient descent always only trying to optimize to the cost surface in that mini-batch.

Therefore, with 100 mini-batches, we have 100 cost surfaces, and we hope those cost surfaces will lead us to a set of neural network weights that generalize to test data very well.

Randomized mini-batch is the most common approach. Sometimes, we may pass the mini-batch to the NN but only use a part of it in calculating the loss value.