3.2 Compute the Total Loss
All you have to do now is define the loss function that you’re going to use. For this case, since we have a classification problem with 6 labels, a categorical cross entropy will work!
You are used to compute the cost value which sums the losses over the whole batch (i.e. all mini-batches) of samples, then divide the sum by the total number of samples. Here, you will achieve this in two steps.
In step 1, the compute_total_loss
function will only take care of summing the losses from one mini-batch of samples. Then, as you train the model (in section 3.3) which will call this compute_total_loss
function once per mini-batch, step 2 will be done by accumulating the sums from each of the mini-batches, and finishing it with the division by the total number of samples to get the final cost value.
Computing the “total loss” instead of “mean loss” in step 1 can make sure the final cost value to be consistent. For example, if the mini-batch size is 4 but there are just 5 samples in the whole batch, then the last mini-batch is going to have 1 sample only. Considering the 5 samples, losses to be [0, 1, 2, 3, 4] respectively, we know the final cost should be their average which is 2. Adopting the “total loss” approach will get us the same answer. However, the “mean loss” approach will first get us 1.5 and 4 for the two mini-batches, and then finally 2.75 after taking average of them, which is different from the desired result of 2. Therefore, the “total loss” approach is adopted here.
Exercise 6 - compute_total_loss
Implement the total loss function below. You will use it to compute the total loss of a batch of samples. With this convenient function, you can sum the losses across many batches, and divide the sum by the total number of samples to get the cost value.
- It’s important to note that the “
y_pred
” and “y_true
” inputs of tf.keras.losses.categorical_crossentropy are expected to be of shape (number of examples, num_classes). tf.reduce_sum
does the summation over the examples.- You skipped applying “softmax” in
Exercise 5
which will now be taken care by thetf.keras.losses.categorical_crossentropy
by setting its parameterfrom_logits=True
(You can read the response by one of our mentors here in the Community for the mathematical reasoning behind it. If you are not part of the Community already, you can do so by going here.)
{moderator edit - solution code removed}
def compute_total_loss_test(target, Y):
pred = tf.constant([[ 2.4048107, 5.0334096 ],
[-0.7921977, -4.1523376 ],
[ 0.9447198, -0.46802214],
[ 1.158121, 3.9810789 ],
[ 4.768706, 2.3220146 ],
[ 6.1481323, 3.909829 ]])
minibatches = Y.batch(2)
for minibatch in minibatches:
result = target(pred, tf.transpose(minibatch))
break
print("Test 1: ", result)
assert(type(result) == EagerTensor), "Use the TensorFlow API"
assert (np.abs(result - (0.50722074 + 1.1133534) / 2.0) < 1e-7), "Test 1 does not match. Did you get the reduce sum of your loss functions?"
### Test 2
labels = tf.constant([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
logits = tf.constant([[1., 0., 0.], [1., 0., 0.], [1., 0., 0.]])
result = compute_total_loss(logits, labels)
print("Test 2: ", result)
assert np.allclose(result, 3.295837 ), "Test 2 does not match."
print("\033[92mAll test passed")
compute_total_loss_test(compute_total_loss, new_y_train )
Test 1: tf.Tensor(0.4051435, shape=(), dtype=float32)
AssertionError Traceback (most recent call last)
in
25 print(“\033[92mAll test passed”)
26
—> 27 compute_total_loss_test(compute_total_loss, new_y_train )
in compute_total_loss_test(target, Y)
13 print("Test 1: ", result)
14 assert(type(result) == EagerTensor), “Use the TensorFlow API”
—> 15 assert (np.abs(result - (0.50722074 + 1.1133534) / 2.0) < 1e-7), “Test 1 does not match. Did you get the reduce sum of your loss functions?”
16
17 ### Test 2
AssertionError: Test 1 does not match. Did you get the reduce sum of your loss functions?