Hello,

In Course 2 Week 2 assignment, Optimization Methods, the code for computing cost is “cost += compute_cost(a, Y)” for Batch Gradient Descent and Stochastic Gradient Descent. I don’t understand why cost = cost + compute_cost(a, Y) ? Is the code not correct?

Thanks for your answer.

There is no need to accumulate the cost as they do in the pseudo code.

For batch gradient descent, we use all our examples each time, so one iteration is one epoch. We could divide the cost by m to get an average cost per training example.

For stochastic gradient descent, we use one training example, so to traverse all our examples, we need m iterations. It might make sense to accumulate costs per example to calculate an average cost per example for one epoch, i.e., division by m in the outer loop.

2 Likes

Thank you for your answer!

1 Like