In the Lecture the Cost function ist the sum of Losses divided by the number of examples.
So I was thinking to divide by two after tf.reduce_sum(…) .
It’s only graded correct if you don’t divide it.
Yes, it is true that the cost is the average of the loss over all the values in the training set. The problem is that things get a bit more complicated once you switch to supporting Minibatch Gradient Descent. The way that they handle that is to convert the lower level cost function to return the sum of the losses, rather than the average. Then they sum those over all the minibatches. When they get to the end of the full Epoch (all the minibatches), then they divide the sum by m to get the overall average. The reason that the average doesn’t work at the level of the minibatch is that all the minibatches are not the same size in the case that the minibatch size does not evenly divide the total training set size. You can’t take the average of the averages in that case, right? The math doesn’t work …
Take a look at the details of how the cost is handled in the Optimization assignment in C2 W2 if you missed that level of detail the first time through.
The instructions specifically tell you to use reduce_sum
for the reason that I explained in my previous reply.