W1 Lecture/Quiz Training steps?

jakhon77 · December 31, 2024, 7:32pm

I have recently completed the Deep Learning Specialization (DLS). According to the DLS course, the correct sequence of steps before performing gradient descent is as follows:

Initialize parameters
Classify or predict
Calculate loss
Calculate gradients
Update parameters

However, I am confused by this slide. Am I missing something here? Could someone please explain this to me? Thank you!

paulinpaloalto · December 31, 2024, 10:26pm

Yes, the key point there is that the actual loss value is not used in computing the gradients. The only purpose of the actual J value itself is as an inexpensive proxy for whether you are getting convergence or not. So it doesn’t matter whether you calculate that before or after you compute the gradients. The gradients are just functions that are affected by the loss but they don’t actually depend on the scalar J value: you just evaluate the functions according to Prof Ng’s formulas.

A better proxy for your convergence is the actual prediction accuracy on the training data (and optionally the validation data), since that’s what you actually base your goals on. And it turns out that there is not a monotonic relationship between cost and accuracy, because accuracy is quantized. The other thing to realize about J is that it’s essentially meaningless by itself. E.g. it’s not comparable between two different models. For convergence you just want to look at the graph of the cost versus iterations to get a picture of what’s happening. Accuracy is more expensive to compute, so the common practice is to evaluate that only every 100 or 500 or 1000 iterations and to just look at the cost graph to get a picture for how convergence is working.

jakhon77 · January 1, 2025, 2:30am

Very well explained!!! Thank you for taking time and explaining in detail!

Topic		Replies	Views
Purpose of cost function Course 1 Week 1 assignment NLP with Classification and Vector Spaces	1	232	December 14, 2021
C1_w4_forward and backward propagation Neural Networks and Deep Learning coursera-platform	7	560	August 29, 2022
How to compute J(w) in gradient checking Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	572	January 6, 2023
Repeat until convergence? Supervised ML: Regression and Classification week-module-1	4	563	January 12, 2023
A DLS computing efficiency question Improving Deep Neural Networks: Hyperparameter tun coursera-platform	9	559	February 15, 2022

W1 Lecture/Quiz Training steps?

Related topics