Section 3.2 in Week3 assignment is just not explained properly. Please help fix this error

3.2 Compute the Total Loss

All you have to do now is define the loss function that you’re going to use. For this case, since we have a classification problem with 6 labels, a categorical cross entropy will work!

You are used to compute the cost value which sums the losses over the whole batch (i.e. all mini-batches) of samples, then divide the sum by the total number of samples. Here, you will achieve this in two steps.

In step 1, the compute_total_loss function will only take care of summing the losses from one mini-batch of samples. Then, as you train the model (in section 3.3) which will call this compute_total_loss function once per mini-batch, step 2 will be done by accumulating the sums from each of the mini-batches, and finishing it with the division by the total number of samples to get the final cost value.

Computing the “total loss” instead of “mean loss” in step 1 can make sure the final cost value to be consistent. For example, if the mini-batch size is 4 but there are just 5 samples in the whole batch, then the last mini-batch is going to have 1 sample only. Considering the 5 samples, losses to be [0, 1, 2, 3, 4] respectively, we know the final cost should be their average which is 2. Adopting the “total loss” approach will get us the same answer. However, the “mean loss” approach will first get us 1.5 and 4 for the two mini-batches, and then finally 2.75 after taking average of them, which is different from the desired result of 2. Therefore, the “total loss” approach is adopted here.

Exercise 6 - compute_total_loss

Implement the total loss function below. You will use it to compute the total loss of a batch of samples. With this convenient function, you can sum the losses across many batches, and divide the sum by the total number of samples to get the cost value.

  • It’s important to note that the “y_pred” and “y_true” inputs of tf.keras.losses.categorical_crossentropy are expected to be of shape (number of examples, num_classes).
  • tf.reduce_sum does the summation over the examples.
  • You skipped applying “softmax” in Exercise 5 which will now be taken care by the tf.keras.losses.categorical_crossentropy by setting its parameter from_logits=True (You can read the response by one of our mentors here in the Community for the mathematical reasoning behind it. If you are not part of the Community already, you can do so by going here.)

{moderator edit - solution code removed}

def compute_total_loss_test(target, Y):
pred = tf.constant([[ 2.4048107, 5.0334096 ],
[-0.7921977, -4.1523376 ],
[ 0.9447198, -0.46802214],
[ 1.158121, 3.9810789 ],
[ 4.768706, 2.3220146 ],
[ 6.1481323, 3.909829 ]])
minibatches = Y.batch(2)
for minibatch in minibatches:
result = target(pred, tf.transpose(minibatch))

print("Test 1: ", result)
assert(type(result) == EagerTensor), "Use the TensorFlow API"
assert (np.abs(result - (0.50722074 + 1.1133534) / 2.0) < 1e-7), "Test 1 does not match. Did you get the reduce sum of your loss functions?"

### Test 2
labels = tf.constant([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
logits = tf.constant([[1., 0., 0.], [1., 0., 0.], [1., 0., 0.]])

result = compute_total_loss(logits, labels)
print("Test 2: ", result)
assert np.allclose(result, 3.295837 ), "Test 2 does not match."

print("\033[92mAll test passed")

compute_total_loss_test(compute_total_loss, new_y_train )
Test 1: tf.Tensor(0.4051435, shape=(), dtype=float32)

AssertionError Traceback (most recent call last)
25 print(“\033[92mAll test passed”)
—> 27 compute_total_loss_test(compute_total_loss, new_y_train )

in compute_total_loss_test(target, Y)
13 print("Test 1: ", result)
14 assert(type(result) == EagerTensor), “Use the TensorFlow API”
—> 15 assert (np.abs(result - (0.50722074 + 1.1133534) / 2.0) < 1e-7), “Test 1 does not match. Did you get the reduce sum of your loss functions?”
17 ### Test 2

AssertionError: Test 1 does not match. Did you get the reduce sum of your loss functions?

1 Like

What specifically would you like to see corrected?

1 Like

The problem is that you have used a different cost function than the one they told you to use. It turns out that the one you used:


Does not behave the same way as the one they gave you:


The former version (by default) gives the mean of the individual loss values. The latter one gives you back the vector of the individual losses. The specification here is that you need the sum, not the mean, because we are doing “minibatch” GD. That means we accumulate the sum over all the minibatches and then compute the mean only at the end of all the batches. You need to do it that way because you can’t take the average of the minibatch averages to get the overall average: the math doesn’t work if the minibatches are not all the same size. And that is not guaranteed, right? If you go back and look carefully at how this was handled in the previous Optimization assignment in Week 2, you’ll see that it was done that way there also. There they wrote it all out directly in python instead of using TF.

1 Like

Thankyou so much!. I appreciate your help. I had spent 3 hrs on this trying to fix it earlier. Specifically what might help is if they can give this information of existance of these two subtly different functions in the explanation section up front


Well, to be fair, in the instructions for that function they specifically gave you a link to the documentation for the correct loss function.

1 Like