Gradient descent cost aggregation

josemrivera · December 13, 2021, 10:51pm

Hi, I’m wondering why in the explanation of Batch Gradient Descent vs Stochastic Gradient Descent it reads:

    cost += compute_cost(a, Y)

When we normally report in Course 1 the cost after each iteration, not an aggregated version. Was this used to generate the plots and was left there?

Thanks.

paulinpaloalto · December 14, 2021, 12:58am

That is probably a mistake in the instructions portion. For “batch” gradient descent, the cost is computed over the whole batch in each iteration.

The thing that motivated the mistake is what happens in the minibatch code that comes later. Take a look at the template code they gave you and you’ll see what really happens: in order to get a cost value that is actually useful for “apples to apples” comparisons, they end up accumulating the total cost over all the minibatches and then divide by the total number of samples to get an average cost per sample for the entire batch. They were just being a little sloppy in the instructions.

josemrivera · December 14, 2021, 5:43pm

Oh that’s it, thanks @paulinpaloalto!

Topic		Replies	Views
Why take cost average in Gradient Descent? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	534	April 28, 2022
Week 2 programming assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	551	October 13, 2021
Course2_week2_assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	612	June 28, 2021
Confused about Mini-Batch Gradient Descent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	557	May 9, 2022
Course 2 Week 3: compute cost solution is wrong? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	528	November 3, 2022

Gradient descent cost aggregation

Related topics