In week 3 assignment, part 3.3 - Train the Model, within the model function, the epoch_cost is computed as: epoch_cost += minibatch_cost / minibatch_size
And I think it should be computed as: num_minibatch = np.ceil(num_examples / minibatch_size) epoch_cost += minibatch_cost / num_minibatch
Because when minibatch_cost is computed by function compute_cost, it has been averaged on the number of minibatch size. So, when computing the epoch_cost, it should be averaged on the number of minibatch, not the size of minibatch.
As long as the mini-batch cost function is correct (it’s the one used in the back-prop), I haven’t given much thought to the epoch-cost function when using mini batches.
I’m also interested in hearing other opinions about this.
I think @Damon is right. At least if you expect to consistently see values within the same order of magnitude as those returned by compute_cost (in the assignment, minibatch_size and num_minibatch happen to be very close).
Although the shape of the learning curve doesn’t change if you scale it by a constant factor, dividing by minibatch_size doesn’t seem to make sense