In model function in the end we count average of cost, but I don’t understand why? Previously we used only cost, why in this case do we use average of it?

The cost is always the average of the loss values across all the samples in the batch (training set). The reason that the way it is computed looks a little different here is that we are doing “minibatch” gradient descent, which means we are splitting up the training set into minibatches and then need to add up the costs for all the minibatches and then divide by the total number of samples to get the usual meaning of the J value (loss averaged across all the samples).

You can check the compute_cost implementation in to see that they are not doing the average there, just the sum over the minibatch.

1 Like