Mini-batch understanding

Nnaemeka_Nwankwo · March 6, 2023, 11:38am

Hello, I want to get more insight about mini batching as I am not so if my understanding is correct. I want to know if the number of iterations of the gradient descent that will now be ran would be the the size of the mini batch in the lecture which is 5000 or would is the gradient descent run 5000 times for each batch size. thank you.

rmwkwok · March 6, 2023, 12:48pm

Hello @Nnaemeka_Nwankwo

If your batch of data has 1000 samples, and if the mini-batch size is 10, then there will be 100 mini-batches, and there will be 100 gradient descents in an epoch that goes through the whole batch of samples.

Cheers,
Raymond

Nnaemeka_Nwankwo · March 6, 2023, 1:30pm

Thank you so much @rmwkwok this was so helpful. that is to say there will be 10 epochs? Also can I run gradient descent more than 100 times for each epoch?

rmwkwok · March 6, 2023, 2:03pm

100 mini-batch per epoch, and 10 samples per mini-batch.

The number of gradient descent is equal to the number of mini-batch.

Nnaemeka_Nwankwo · March 6, 2023, 3:01pm

This clarified a lot thank you @rmwkwok

paulinpaloalto · March 6, 2023, 3:53pm

But of course the number of total epochs you run is another matter entirely. As Raymond says, the definition of an “epoch” of training is one complete pass through the entire training set (all the minibatches). The number of total epochs you need is determined by how well the convergence works. The entire point of minibatch gradient descent is that you are updating the parameters after each minibatch, so you should be able to converge more quickly with fewer total “epochs”. But if you make the minibatch size too small, then you can also get more oscillations and statistical noise in your updates. So you need to choose the minibatch size correctly. In other words the minibatch size is what Prof Ng calls a “hyperparameter”, which means a value that you need to choose rather than one that can be directly learned through back propagation.

Nnaemeka_Nwankwo · March 6, 2023, 4:21pm

This is really a great explanation. what if the convergence doesn’t work so well, can you increase the epochs or just increase increasing the mini batch size. Thank you.

paulinpaloalto · March 6, 2023, 5:29pm

There are a number of things to try if the convergence is not working well. The first thing is to graph the cost over the epochs to get a sense for what the problem is: does it oscillate? Or does it decrease for a while and then start increasing again? Or is it decreasing, but just not very fast?

The hyperparameters you have to adjust here are the learning rate, the minibatch size and the total number of epochs. If the cost is oscillating, the first things to try would be a lower learning rate and maybe a larger minibatch size. If the cost is decreasing, but not fast enough, then try more iterations and a higher learning rate. Note that Prof Ng will also introduce us to some more sophisticated concepts here in Course 2 like momentum, RMS and Adam optimization as well as techniques for dynamically managing the learning rate.

Nnaemeka_Nwankwo · March 7, 2023, 10:53am

Thank you so much. It helped a lot.

Topic		Replies	Views
Gradient steps in Mini batch vs batch Improving Deep Neural Networks: Hyperparameter tun	4	775	May 18, 2021
[Mini-batch gradient descent] Did Andrew mean "epoch" instead of "iteration"? Improving Deep Neural Networks: Hyperparameter tun	4	620	July 7, 2021
Mini Batch Gradient Descent vs Batch GD Improving Deep Neural Networks: Hyperparameter tun	2	559	May 23, 2021
Confusion Regarding Week 2 Video - 'Understanding Mini batch Gradient Descent' Improving Deep Neural Networks: Hyperparameter tun	6	563	October 27, 2021
Batch vs MiniBatch Improving Deep Neural Networks: Hyperparameter tun	1	521	November 1, 2021

Mini-batch understanding

Related topics