Repeat until convergence?

Svetlana_Verthein · January 9, 2023, 8:22pm

I must be missing something really obvious, but please bear with me…
At the end of week 1 we we calculate gradient descent as: repeat (decreasing w and b) until convergence.
I understand what it means, and the visualizations are very helpful, but i don’t see in the code how we test for convergence. Shouldn’t we be comparing updated cost function J (w, b) with the previous cost function J (w,b) to ensure it is still decreasing and we did not overstep the minimum? I see in the code we stop computing gradient descent when a fixed number of iterations is reached - but that number is arbitrary.
We have to be able to do it programatically?
thank you!

TMosh · January 9, 2023, 8:41pm

Since this is Week 1 of an introduction course, we don’t actually test for convergence in the code. It’s done visually from the cost history plot.

rmwkwok · January 10, 2023, 3:19am

Hello @Svetlana_Verthein,

Great observation! For your information, that’s called “Early Stopping” and it isn’t covered in this course, however, the idea is just like what you have suggested. In particular, we want to compare the cost based on the cv set so that we will “early stop” when at least that the cv cost stops improving. Tensorflow implements this (here is the link), and I think you want to have a look at the list of parameters you can set, such as monitor, min_delta, and patience.

Cheers,
Raymond

Svetlana_Verthein · January 11, 2023, 11:45pm

Thank you, Raymond! This is very helpful. I’m in Week 2 now, and now I understand why convergence cannot be tested as simply comparing the new cost J to the previous cost of J (I thought that was all that was needed!)
It’s because if new J starts increasing, it may mean either: a) J minimum has been achieved or b) alpha is too large (or there is a bug in the code) - correct? Two very different scenarios.
Looking into Tensorflow EarlyStopping now - thanks for the link!

rmwkwok · January 12, 2023, 2:26am

Another great point, but please let me adjust your words a little bit to the following:

if new J stops improving, it may mean either: a) J minimum has been achieved or b) alpha is too large (or there is a bug in the code) - correct? Two very different scenarios.

I want to make 2 points:

When you are in Course 2 Week 3, you will come across an idea called “splitting a data set into a training set and a cv set”. You will find out why it’s important we evaluate our model on both the training set and the cv set under the cost function. Therefore, there WILL BE two J values and you will learn how to use them.
The two scenarios that you mentioned are pretty common in a model training process. (a) happens when the model successfully converges to a (local) minimum. (b) happens when the model diverges.

Keep learning!
Raymond

Topic		Replies	Views
Identifying convergence in code Supervised ML: Regression and Classification week-2	4	521	July 8, 2022
Supervised Machine Learning Optional lab: Gradient descent Question Supervised ML: Regression and Classification week-1	6	554	July 11, 2023
Why is the number of iterations in gradient descent specified? Supervised ML: Regression and Classification week-1	4	502	March 20, 2023
Gradient descent convergence_test Supervised ML: Regression and Classification week-2	2	537	July 15, 2022
Week 2 practice lab - convergence question Supervised ML: Regression and Classification week-2	2	464	May 8, 2023

Repeat until convergence?

Related topics