According to what I understood on the lessons, mini-batch gradient descent was intended to be used in order to speed up the learning process when we have a great amount of data (talking about millions of entries).
However, on the practice of that week we used it on a m=300 dataset and it also helped to get better results faster.
Should we then consider using mini-batch gradient descent (either with momentum, RMSprop or Adam) no matter the size of our traning set?