I have two questions about the DLS Course 2 Week 3 programming exercises, Exercise 6 - compute_cost and 3.3 - Train the Model.
The first one:
I passed Exercise 6 with the code below.
cost = tf.reduce_sum(tf.keras.losses.categorical_crossentropy(tf.transpose(labels), tf.transpose(logits), from_logits=True), axis=0)
However, I have a feeling that this is not the true cost function formula. Because the return value here seems to be the sum of the losses for each training data, not the average.
Is there a division by minibatch size built into this function somewhere?
The second:
In the cell defining the model() function in 3.3 - Train the Model, there is a line like this:
epoch_cost /= m
The “epoch_cost” calculated here is the sum of all the costs for each mini-batch. For example, if the number of training data is 1024 and the mini-batch size is 64, this epoch_cost is 16 times the cost calculated per mini-batch. In each mini-batch, the cost is calculated as “(the sum of losses per datum) / 64”. So the average cost per epoch should be “(the sum of losses per datum) / 1024”, i.e. “The sum of the costs per mini batch/ 16”.
But that is not the case with epoch_cost /= m.
I believe this ‘m’ is defined as the number of elements in the entire training data. So, in the example above, I’m wondering if I’m calculating “(the sum of the costs in each mini-batch) / 1024”. In this case, I think that the cost will be calculated smaller than expected.
If the return value “cost” in the first question indicates the sum of the losses for each training data, then this notebook happens to calculate the epoch_cost correctly. (Because minibatch_cost is defined to represent the sum of the losses of each training data in mini batch, not the cost as defined as usual.)
However, I am not very satisfied with this notebook configuration. Therefore, I am not at all confident in my idea. Please let me know if there are any misunderstandings.