Clarification about weights in mini batch processing

avnish · December 1, 2021, 6:14am

Would like to confirm I have this right wrt weights updating in mini batch processing

prior to mini batch processing, all weights start the same (across all the samples)
after all samples are processed and loss computed, weights are adjusted
for next mini batch, weights for all samples now start at same point with new values

paulinpaloalto · December 1, 2021, 6:17pm

Weights are independent of samples. The dimensions of W^{[l]} and b^{[l]} are determined only by the number of features in the input and then the number of output neurons in each layer, right?

The key point about minibatch gradient descent is that the all the weights are updated after each minibatch. That is as opposed to “full batch” gradient descent in which the weights get updated only after processing the entire batch of samples. The terminology is that one “epoch” means one training iteration through all the training samples. So in “full batch” the weights get updated once per epoch. In the case of minibatch, one epoch means iterating over all of the minibatches that form the full set of training samples, so the weights will be updated multiple times per epoch. That is the primary advantage of minibatch: when it works well, you end up being able to get the same level of convergence with fewer total epochs of training. Of course, as with everything here, there is no guarantee that always happens, since you can also have more statistical noise in the gradients in the minibatch case, particularly if the minibatch size is very small. So you may also need to also apply momentum or other techniques to mitigate that.

avnish · December 3, 2021, 8:32am

Thanks very much - helps a lot.

Topic		Replies	Views
Gradient steps in Mini batch vs batch Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	797	May 18, 2021
Understanding batch gradient descent over the entire training set Neural Networks and Deep Learning week-4 , coursera-platform	9	145	August 5, 2024
Possibility of loosing Effectiveness of large datasets in Mini Batch Improving Deep Neural Networks: Hyperparameter tun week-2 , coursera-platform	6	22	May 11, 2025
Why is Mini-batch Gradient Descent more efficient? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	555	December 24, 2022
What is the advantage of Mini batch gradient descent over batch gradient descent? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	566	May 17, 2021

Clarification about weights in mini batch processing

Related topics