The computation of the cost function: compute_cost()

Bertrand_T_Tameza · January 3, 2022, 3:03am

Hello,

I have implemented the previous questions of the notebook correctly, but I am struggling with the cost function.

I have used the tf.keras.losses.categorical() fonction for this end, assigning the right tensors to the y_true and y_pred arguments, and reshaping the inputs tensors doing something like tf.reshape(labels, tf.transpose(tf.shape(labels))). Is this normal? I am getting an error. another thing is that I didn’t use the tf.reduce_mean() at all, not knowing where to use it.

Can someone give me some tips on how to solve this?

Thanks

paulinpaloalto · January 3, 2022, 5:47am

There should be no need to use “reshape” there: just transpose the logits and the labels. The other thing to note is that because we are passing “logits” as the input, meaning there is no output activation applied, you need to use the from_logits argument to the categorical loss function to tell it that. Please consult the documentation for the loss function: they give you the link in the instructions.

Then you need to use tf.reduce_sum to produce a scalar loss value: the output of the loss function is “per sample” and you need to sum across the samples. Note that the overall cost will be the average of the costs, but this cost function is designed to be the lower level that computes the total cost per minibatch. Then you take the average once you have the sum across all the minibatches.

Bertrand_T_Tameza · January 3, 2022, 7:20am

Hi Paul,

Thanks so much! I followed the advice and it works perfectly

moose · September 11, 2022, 3:35am

in the latest lab it asks us to use tf.reduce_sum instead of tf.reduce_mean. the code only passes when you use tf.reduce_sum and dont divide by the number of total examples. This is odd as thats not how we learned to compute the cost function. Why is that?

paulinpaloalto · September 11, 2022, 3:44am

Once you switch to minibatch gradient descent, it works better to keep the running sum of cost over all the samples in each batch. Once you have the total sum for the complete epoch by keeping the running sum over the batches, then you can divide by the number of samples in the full epoch to get the final average cost. If you average at the level of the individual minibatches, that doesn’t work unless all the minibatches are the same size. In the case that the minibatch size does not evenly divide the total training set, the last minibatch will be a different size, which means the average of the averages is not the same as the overall average, right?

paulinpaloalto · September 11, 2022, 5:52pm

BTW you can see this method of handling the minibatch cost in action in the Optimization Methods assignment in Week 2 of Course 2. Note that the compute_cost function they provide in the utility function file returns the sum of the cost across the samples of the current batch.

Aygul_Ishemgulova · October 11, 2022, 5:51pm

This is very helpful, thank you!

rasgulla.c · November 4, 2022, 2:03pm

Hi, can you please talk about it more in simpler term?

paulinpaloalto · November 4, 2022, 2:31pm

Here is another thread which discusses that in more detail. Please have a look and let us know if that’s not enough.

Srawat · February 11, 2023, 7:57am

Thanks Paul, this was super helpful.

Topic		Replies	Views
DLS 2 week 3 exercise 6 compute_cost Improving Deep Neural Networks: Hyperparameter tun coursera-platform	41	4048	September 20, 2024
Compute cost failed to pass Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	553	August 3, 2021
Assignment week 3 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	744	January 17, 2023
DLS 2 Week 3_Exercise_6_compute_cost()_ERROR Improving Deep Neural Networks: Hyperparameter tun coursera-platform	28	1637	August 28, 2024
Code does not correctly calculate cost despite the code seeming right Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	742	August 1, 2023

The computation of the cost function: compute_cost()

Related topics