Why is Mini-batch Gradient Descent more efficient?

howardkan · August 3, 2021, 3:27am

In my understanding, mini-batches gradient descent break the whole batch into many mini-batches and then iterate all mini-batches. I would like to know why it is more efficient using a for loop for mini-batches rather than propagate through the all data in the vector? Thank you.

Mihir09 · August 3, 2021, 4:57pm

hi.Mini batch GD breaks the training set into number of mini batches.
After this it processes a single mini-batch and updates its parameters W1,b1,…,Wn,bn to optimize the cost function.
Mini batch GD does this thing for every mini-batch, i.e processes the mini-batch and then updates its params to optimize the cost.
While in Batch GD, all the training examples are processed altogether(which takes high amount of processing time) and then updates its params to optimize the cost.
So in mini-batch GD,learning starts shortly after the processing of just a single batch rather than waiting for the whole training set to get processed as in batch GD.

howardkan · August 4, 2021, 3:51am

Thank you @Mihir09 , I now see that mini-batch GD starts updating parameters quicker than batch GD.

neilsikka · November 17, 2022, 4:22pm

It seems like minibatches would allow us to end the training process early assuming the first few batches are representative of the average data set as a whole, because we would have already started making progress toward the minimum of the cost function quickly. Is early termination the reason that people says that minibatch gradient descent is “faster” than batch gradient descent?

paulinpaloalto · December 24, 2022, 12:49am

Yes, the point of Minibatch GD is that you may be able to achieve the same level of convergence with fewer total “epochs” of training. Recall that the definition of an “epoch” is one complete pass through the full training set. Of course in full Batch GD, that just means one iteration, but in Minibatch it means one pass through all the minibatches.

Topic		Replies	Views
Gradient steps in Mini batch vs batch Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	797	May 18, 2021
Why is Batch Gradient Descent slower than Mini-batch Gradient Descent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	548	November 27, 2022
Batch vs MiniBatch Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	521	November 1, 2021
Mini-batch Gradient descent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	540	August 27, 2022
Quiz weak 1 about different between one epoch and one iteration Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	550	January 2, 2022

Why is Mini-batch Gradient Descent more efficient?

Related topics