Stochastic_Gradient_descent

Can someone explain me the reason for the noise in cost function ??

I have an intuition that when we iterate over an epoch , then for 2 different mini batches , Our model tries to minimize the cost . But in the process ,lets say we run gradient descent on a particular mini batch that would also affect the cost of some of our previous mini batch(on which gradient descent has been done) , so the cost function has noise.

Yes the cost function value is different for each example and for each batch, it will go up and down, just when you use batches an average is taken.

Generally speaking it fluctuates up and down!

Yes, it’s a basic principle of statistics that things are smoother the larger your sample size is, right? So when you switch from full batch gradient descent to minibatch gradient descent, you are introducing more statistical noise by sampling subsets of your training data. And the smaller you make the subsets, the noisier it gets. In the limit you have Stochastic Gradient Descent which by definition is a batchsize of 1 sample, so that maximizes the noise.

1 Like