Why take cost average in Gradient Descent?

Sua · April 28, 2022, 4:20pm

In the exercise, Optimization methods, the implementation for GD and SGD takes the cost average at the very end. I don’t believe we’ve done this before in other notebooks. Why does this notebook take the average?

paulinpaloalto · April 28, 2022, 11:14pm

The cost is normally defined as the average of the loss function values across all the samples in the Epoch. So they have written the compute_cost utility function that they gave you here (you can find the source by clicking “File → Open” and then opening the appropriate python file) so that it returns the sum of the costs across the current batch of samples. With that design, then they can use that same subroutine in all three different cases: Full Batch Gradient Descent, MiniBatch GD or Stochastic GD. In the latter two cases, you need to keep the running sum of the costs across all the minibatches.

paulinpaloalto · April 28, 2022, 11:27pm

Note that there is a more complicated way to deal with this issue: you could have compute_cost compute the average over whatever inputs it is given. Then add those up in the outer loop. Then at the end divide by the number of minibatches and you get the same result: the average over the full batch. There’s just one little problem: that math doesn’t work if the minibatch size does not evenly divide the total batch size.

So there are really two problems with that approach: even in the case that the minibatch size evenly divides the full batch, it’s just too complicated and you have to think hard to convince yourself it works. And then there’s the problem that it doesn’t work in all cases.

The way they did it is simple and clearly correct in all cases.

Topic		Replies	Views
Course2_week2_assignment Improving Deep Neural Networks: Hyperparameter tun	1	610	June 28, 2021
Course 2 Week 3: compute cost solution is wrong? Improving Deep Neural Networks: Hyperparameter tun	2	519	November 3, 2022
Gradient descent cost aggregation Improving Deep Neural Networks: Hyperparameter tun	2	524	December 14, 2021
Confused about Mini-Batch Gradient Descent Improving Deep Neural Networks: Hyperparameter tun	3	556	May 9, 2022
DLS Week 2, Exercise 2, Ex. 7 Avg Cost Calc and Backward Prop Improving Deep Neural Networks: Hyperparameter tun	1	514	April 25, 2023

Why take cost average in Gradient Descent?

Related topics