Week 2 - When to use mini-batch gradient descent

According to what I understood on the lessons, mini-batch gradient descent was intended to be used in order to speed up the learning process when we have a great amount of data (talking about millions of entries).

However, on the practice of that week we used it on a m=300 dataset and it also helped to get better results faster.

Should we then consider using mini-batch gradient descent (either with momentum, RMSprop or Adam) no matter the size of our traning set?

In general, it is always better to use mini-batch gradient descent to train your neural network even with a small dataset. This speeds up the training process as weights are updated with each mini-batch.
It also reduces the memory overhead as only a batch is loaded into memory at a time and not the entire training set.