Do we use value of w of the previous batch for next batch in Mini batch gradient descent

Shabbar_Zaidi · August 6, 2025, 7:54pm

In minibatch gradient decent do we use updated W from the previous batch and use it as an input to next batch or we calculate W for every batcha nd then we take average ?

paulinpaloalto · August 6, 2025, 8:07pm

Yes, that is why minibatch GD is useful in most cases: we are applying the updates to the parameters (all the W and b values) after every minibatch. So the updates happen more quickly and, even though they may have a bit more statistical “noise” in them, the overall convergence happens after fewer total “epochs” or passes through the entire training dataset.

Shabbar_Zaidi · August 7, 2025, 7:11pm

In this exponential Wweighted averages, does it mean that
w[for next batch] = beta*W[from previousbatch] + (1-Beta)*W[current batch]
and beta will adjust number of previous batches to consider.
Why are we using dW instead of W. We want to get best value of W after every mini batch and dW is just telling change.

paulinpaloalto · August 7, 2025, 10:03pm

The slide you are showing is the generic implementation of exponentially weighted averages. The theta values there are temperatures sampled at some time interval.

If the question is about how we use EWAs to implement minibatch gradient descent, here’s a more directly relevant slide from the lecture on Momentum:

There we have the input values dW computed from the current minibatch and W which is the updated W from the last minibatch. Then we compute the “smoothed” version of dW by taking the EWA of the last few values of dW by using vdW and extending that average by the new dW from this minibatch. We then apply that in the update formula to get the new value of W after the current minibatch. And similarly for b and db.

Topic		Replies	Views
Gradient descent exponential weighted average Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	544	May 11, 2022
Clarification about weights in mini batch processing Sequence Models coursera-platform	2	528	December 3, 2021
Implementing exponentially weighted averages Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	521	April 5, 2023
Gradient steps in Mini batch vs batch Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	814	May 18, 2021
Momentum Updates Confusion Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	496	February 8, 2022

Do we use value of w of the previous batch for next batch in Mini batch gradient descent

Related topics