Hi,
I have two small questions about the model implementation in # 7 - Learning Rate Decay and Scheduling in the week 2, exercise 2 problem. These aren’t graded exercises, but I was confused why the code is being implemented this way.
The model’s avg_cost is calculated as avg_cost / m. Wouldn’t this only be correct in stochastic gradient descent? Shouldn’t the denominator for the avg_cost be (m / mini_batch_size) if cost_total is only being updated after each minibatch?
cost_avg = cost_total / m
Second, why is backward prop function called with minibatch_X instead of al3, the output of the forward prop function, which is believe is what was implemented in earlier exercises?
for minibatch in minibatches:
# Select a minibatch
(minibatch_X, minibatch_Y) = minibatch
# Forward propagation
a3, caches = forward_propagation(minibatch_X, parameters)
# Compute cost and add to the cost total
cost_total += compute_cost(a3, minibatch_Y)
# Backward propagation
grads = backward_propagation(minibatch_X, minibatch_Y, caches)
Thanks