The best mini-batch size usually not 1 and not m, but instead something in-between

Yes, if you are making the statement that you put into the title here, then I think that is a true statement. Prof Ng does discuss this in the lectures on minibatch gradient descent. There is also the famous Yann LeCun quote on this subject: “Friends don’t let friends use minibatch sizes larger than 32”.

Here’s another thread which discusses this point in a bit more detail.

1 Like