Section 3.2 in Week3 assignment is just not explained properly. Please help fix this error

adsruthi · March 17, 2024, 5:10am

3.2 Compute the Total Loss

All you have to do now is define the loss function that you’re going to use. For this case, since we have a classification problem with 6 labels, a categorical cross entropy will work!

You are used to compute the cost value which sums the losses over the whole batch (i.e. all mini-batches) of samples, then divide the sum by the total number of samples. Here, you will achieve this in two steps.

In step 1, the compute_total_loss function will only take care of summing the losses from one mini-batch of samples. Then, as you train the model (in section 3.3) which will call this compute_total_loss function once per mini-batch, step 2 will be done by accumulating the sums from each of the mini-batches, and finishing it with the division by the total number of samples to get the final cost value.

Computing the “total loss” instead of “mean loss” in step 1 can make sure the final cost value to be consistent. For example, if the mini-batch size is 4 but there are just 5 samples in the whole batch, then the last mini-batch is going to have 1 sample only. Considering the 5 samples, losses to be [0, 1, 2, 3, 4] respectively, we know the final cost should be their average which is 2. Adopting the “total loss” approach will get us the same answer. However, the “mean loss” approach will first get us 1.5 and 4 for the two mini-batches, and then finally 2.75 after taking average of them, which is different from the desired result of 2. Therefore, the “total loss” approach is adopted here.

Exercise 6 - compute_total_loss

Implement the total loss function below. You will use it to compute the total loss of a batch of samples. With this convenient function, you can sum the losses across many batches, and divide the sum by the total number of samples to get the cost value.

It’s important to note that the “y_pred” and “y_true” inputs of tf.keras.losses.categorical_crossentropy are expected to be of shape (number of examples, num_classes).
tf.reduce_sum does the summation over the examples.
You skipped applying “softmax” in Exercise 5 which will now be taken care by the tf.keras.losses.categorical_crossentropy by setting its parameter from_logits=True (You can read the response by one of our mentors here in the Community for the mathematical reasoning behind it. If you are not part of the Community already, you can do so by going here.)

{moderator edit - solution code removed}

def compute_total_loss_test(target, Y):
pred = tf.constant([[ 2.4048107, 5.0334096 ],
[-0.7921977, -4.1523376 ],
[ 0.9447198, -0.46802214],
[ 1.158121, 3.9810789 ],
[ 4.768706, 2.3220146 ],
[ 6.1481323, 3.909829 ]])
minibatches = Y.batch(2)
for minibatch in minibatches:
result = target(pred, tf.transpose(minibatch))
break

print("Test 1: ", result)
assert(type(result) == EagerTensor), "Use the TensorFlow API"
assert (np.abs(result - (0.50722074 + 1.1133534) / 2.0) < 1e-7), "Test 1 does not match. Did you get the reduce sum of your loss functions?"

### Test 2
labels = tf.constant([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
logits = tf.constant([[1., 0., 0.], [1., 0., 0.], [1., 0., 0.]])

result = compute_total_loss(logits, labels)
print("Test 2: ", result)
assert np.allclose(result, 3.295837 ), "Test 2 does not match."

print("\033[92mAll test passed")

compute_total_loss_test(compute_total_loss, new_y_train )
Test 1: tf.Tensor(0.4051435, shape=(), dtype=float32)

AssertionError Traceback (most recent call last)
in
25 print(“\033[92mAll test passed”)
26
—> 27 compute_total_loss_test(compute_total_loss, new_y_train )

in compute_total_loss_test(target, Y)
13 print("Test 1: ", result)
14 assert(type(result) == EagerTensor), “Use the TensorFlow API”
—> 15 assert (np.abs(result - (0.50722074 + 1.1133534) / 2.0) < 1e-7), “Test 1 does not match. Did you get the reduce sum of your loss functions?”
16
17 ### Test 2

AssertionError: Test 1 does not match. Did you get the reduce sum of your loss functions?

TMosh · March 17, 2024, 5:15am

What specifically would you like to see corrected?

paulinpaloalto · March 17, 2024, 4:00pm

The problem is that you have used a different cost function than the one they told you to use. It turns out that the one you used:

tf.keras.losses.CategoricalCrossentropy

Does not behave the same way as the one they gave you:

tf.keras.losses.categorical_crossentropy

The former version (by default) gives the mean of the individual loss values. The latter one gives you back the vector of the individual losses. The specification here is that you need the sum, not the mean, because we are doing “minibatch” GD. That means we accumulate the sum over all the minibatches and then compute the mean only at the end of all the batches. You need to do it that way because you can’t take the average of the minibatch averages to get the overall average: the math doesn’t work if the minibatches are not all the same size. And that is not guaranteed, right? If you go back and look carefully at how this was handled in the previous Optimization assignment in Week 2, you’ll see that it was done that way there also. There they wrote it all out directly in python instead of using TF.

adsruthi · March 18, 2024, 2:23am

Thankyou so much!. I appreciate your help. I had spent 3 hrs on this trying to fix it earlier. Specifically what might help is if they can give this information of existance of these two subtly different functions in the explanation section up front

Best
Amardeep

paulinpaloalto · March 18, 2024, 2:27am

Well, to be fair, in the instructions for that function they specifically gave you a link to the documentation for the correct loss function.

Topic		Replies	Views
I am confused, can you help me? Improving Deep Neural Networks: Hyperparameter tun week-3	5	71	August 21, 2024
Tensorflow_introduction, compute_total_loss() Improving Deep Neural Networks: Hyperparameter tun	3	1087	August 21, 2023
Assignment week 3 Improving Deep Neural Networks: Hyperparameter tun	5	740	January 17, 2023
The computation of the cost function: compute_cost() Improving Deep Neural Networks: Hyperparameter tun	9	764	February 11, 2023
Please help with compute total loss! Improving Deep Neural Networks: Hyperparameter tun	3	750	February 27, 2023

Section 3.2 in Week3 assignment is just not explained properly. Please help fix this error

3.2 Compute the Total Loss

Exercise 6 - compute_total_loss

​ compute_total_loss_test(compute_total_loss, new_y_train ) Test 1: tf.Tensor(0.4051435, shape=(), dtype=float32)

Related topics

compute_total_loss_test(compute_total_loss, new_y_train )
Test 1: tf.Tensor(0.4051435, shape=(), dtype=float32)