Hi, I’m wondering why in the explanation of Batch Gradient Descent vs Stochastic Gradient Descent it reads:
cost += compute_cost(a, Y)
When we normally report in Course 1 the cost after each iteration, not an aggregated version. Was this used to generate the plots and was left there?
Thanks.
That is probably a mistake in the instructions portion. For “batch” gradient descent, the cost is computed over the whole batch in each iteration.
The thing that motivated the mistake is what happens in the minibatch code that comes later. Take a look at the template code they gave you and you’ll see what really happens: in order to get a cost value that is actually useful for “apples to apples” comparisons, they end up accumulating the total cost over all the minibatches and then divide by the total number of samples to get an average cost per sample for the entire batch. They were just being a little sloppy in the instructions.
Oh that’s it, thanks @paulinpaloalto!