Hello, I want to get more insight about mini batching as I am not so if my understanding is correct. I want to know if the number of iterations of the gradient descent that will now be ran would be the the size of the mini batch in the lecture which is 5000 or would is the gradient descent run 5000 times for each batch size. thank you.
Hello @Nnaemeka_Nwankwo
If your batch of data has 1000 samples, and if the mini-batch size is 10, then there will be 100 mini-batches, and there will be 100 gradient descents in an epoch that goes through the whole batch of samples.
Cheers,
Raymond
Thank you so much @rmwkwok this was so helpful. that is to say there will be 10 epochs? Also can I run gradient descent more than 100 times for each epoch?
100 mini-batch per epoch, and 10 samples per mini-batch.
The number of gradient descent is equal to the number of mini-batch.
This clarified a lot thank you @rmwkwok
But of course the number of total epochs you run is another matter entirely. As Raymond says, the definition of an “epoch” of training is one complete pass through the entire training set (all the minibatches). The number of total epochs you need is determined by how well the convergence works. The entire point of minibatch gradient descent is that you are updating the parameters after each minibatch, so you should be able to converge more quickly with fewer total “epochs”. But if you make the minibatch size too small, then you can also get more oscillations and statistical noise in your updates. So you need to choose the minibatch size correctly. In other words the minibatch size is what Prof Ng calls a “hyperparameter”, which means a value that you need to choose rather than one that can be directly learned through back propagation.
This is really a great explanation. what if the convergence doesn’t work so well, can you increase the epochs or just increase increasing the mini batch size. Thank you.
There are a number of things to try if the convergence is not working well. The first thing is to graph the cost over the epochs to get a sense for what the problem is: does it oscillate? Or does it decrease for a while and then start increasing again? Or is it decreasing, but just not very fast?
The hyperparameters you have to adjust here are the learning rate, the minibatch size and the total number of epochs. If the cost is oscillating, the first things to try would be a lower learning rate and maybe a larger minibatch size. If the cost is decreasing, but not fast enough, then try more iterations and a higher learning rate. Note that Prof Ng will also introduce us to some more sophisticated concepts here in Course 2 like momentum, RMS and Adam optimization as well as techniques for dynamically managing the learning rate.
Thank you so much. It helped a lot.