A doubt in Week 3 Assignment

JJaassoonn · September 4, 2023, 6:36am

Dear Administrator,

Could you please guide me the following issue?

I found that in the “Section 3.3 - Train the Model” in “Week 3 Assignment - Tensorflow_introduction”, a total loss is being used to calculate the gradients instead of mean loss which I learned in the previous lectures. May i know whether this approach quoted from the assignment is correct?

*I use pseudocode instead to prevent breach of rules

For the minibatch_X with minibatch_Y in minibatches:
        
   Name tf.GradientTape() as tape:

        Z3 <- Call forward propagation function

        minibatch_total_loss <- compute **total loss**
  
   Define trainable_variables as W1, b1, W2, b2, W3, b3

   grads <- Call tape.gradient function, 
            passing minibatch_total_loss and 
            trainable_variables as arguments

   Update parameters by using optimizer

Thank you.

balaji.ambresh · September 4, 2023, 9:22am

Calculating the gradient based on total loss is incorrect. As you observed, mean is correct.

Please wait while I get more information from other mentors / staff regarding this.

paulinpaloalto · September 4, 2023, 7:31pm

The reason that compute_total_loss uses the sum instead of the mean is that it is used on the individual minibatches. We want the overall average cost for the whole epoch, but we can’t get that by taking the average at the minibatch level: the math doesn’t work unless all the minibatches are the same size, which might not be true, right? So we compute the running sum and then finally compute the mean at the end of the epoch.

        # We divide the epoch total loss over the number of samples
        epoch_total_loss /= m

But you’re right that the code is using the gradients computed relative to the sum at the minibatch level. But that just means that you’re scaling the gradients up by the scalar factor of {m_b} where m_b is the minibatch size. They are still vectors that point in the same direction, so we might need to tweak the learning rate to be lower, but the result is the same. If you minimize J, you’ve also minimized \frac {1}{m} * J, right? In other words, the solution we end up finding by Gradient Descent should be the same.

Topic		Replies	Views
Section 3.2 in Week3 assignment is just not explained properly. Please help fix this error Improving Deep Neural Networks: Hyperparameter tun week-3	4	335	March 18, 2024
Assignment week 3 Improving Deep Neural Networks: Hyperparameter tun	5	736	January 17, 2023
Introduction to TensorFlow Improving Deep Neural Networks: Hyperparameter tun	2	532	July 20, 2021
The computation of the cost function: compute_cost() Improving Deep Neural Networks: Hyperparameter tun	9	761	February 11, 2023
TensorFlow Introduction - Compute_total_loss Improving Deep Neural Networks: Hyperparameter tun	2	543	February 25, 2023

A doubt in Week 3 Assignment

Related topics