What is the advantage of Mini batch gradient descent over batch gradient descent?

venkatesh · May 17, 2021, 3:21am

Isn’t the advantage of Mini batch gradient descent is to speed up the training process ? In that case why are we trying to “tune” it by using various values (which will further increase the time to train) ? How does the model learn different parameters with different minibatch sizes?
In the course it is said that when we use minibatch size = 1 we are losing the advantage of vectorizing. In that case shouldn’t there be a maximum mini batch size which depends on the maximum parellel processing that can be achieved ?

paulinpaloalto · May 17, 2021, 4:22am

The point of minibatch gradient descent is that you get to update the parameters more often, so you should get faster convergence if you choose the minibatch size appropriately. In the limits on either end you get SGD (batch size = 1), which has the disadvantage you mention: you lose the benefits of vectorization. Or in the limit in the other direction you get minibatch size = the full training set, which gives you maximum vectorization, but you lose the benefit of more frequent updates to the parameters. The real point is that you hope there is a “Goldilocks” value somewhere in the middle that gives you something close to optimal convergence. As Yann LeCun famously said: “Friends don’t let friends use minibatch sizes > 32”

Topic		Replies	Views
What is the main benefit of minibatch size Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	556	May 9, 2021
Gradient steps in Mini batch vs batch Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	831	May 18, 2021
Why is Batch Gradient Descent slower than Mini-batch Gradient Descent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	557	November 27, 2022
Week 2 - When to use mini-batch gradient descent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	552	June 15, 2021
Week 2 Quiz question 2 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	577	July 30, 2023

What is the advantage of Mini batch gradient descent over batch gradient descent?

Related topics