C2_W1_early stopping/orthogonalization

abdou_brk · September 5, 2022, 12:03pm

hello everyone
when prof andrew talked about disadvantages of early stopping , he said early stopping limit the search space of hyperparameters since we don’t optimize the cost function J and thus we can’t pick the best model among those we train but i think we can apply early stopping to each trained model and choose the one that do the best job when we stop gradient descent isn’t this a good aproach??

gent.spah · September 5, 2022, 1:11pm

Yeah, you could choose the best out of them but the point here is that if you continued training you might have found a much better optima than what you already have.

abdou_brk · September 5, 2022, 5:12pm

but the purpose of early stopping is to stop training when the dev set error start increasing even though there is a better optima in the training set error and because the model start overfit the data after the point of early stopping we can’t
find a much better optima in the dev set . And so the point here is that instead of keep training our models and then choose the one that minimize J(train) and then take this same model and try to fix overfitting we can simply do both tasks more efficiently by choosing the model that minimize this time not J(train) but J(test) in other words we choose the model that do the best work without overfitting the data . and here i don’t know why this is a problem to perform both tasks at the same time

paulinpaloalto · September 5, 2022, 7:21pm

I think Prof Ng’s overall point here is that “early stopping” is too simplistic a strategy: it’s only part of the solution. The point is what do you do if the results at the early stopping point still are not good enough to satisfy the requirements of your system? Or to put it another way: overfitting is just one problem you may have to deal with in order to get to a good enough solution. Stopping early is one way to approach that, but you may find cases in which that is not sufficient.

The general message in all this section of the course is that there is no one “silver bullet” magic recipe that will solve all your problems in all cases. You need to have a repertoire of approaches and have the experience and judgement to pick the one(s) that are appropriate for the problem at hand. It’s that kind of experience that Prof Ng is trying to impart to us here.

Topic		Replies	Views
Course 3 Why don't you use early stopping? Structuring Machine Learning Projects	3	719	May 31, 2021
Downside of Other Regularization Technique - Early Stopping Improving Deep Neural Networks: Hyperparameter tun	2	452	June 4, 2023
Orthogonality and speed Structuring Machine Learning Projects week-1	2	289	February 26, 2024
Early stopping - why dev set error starts to increase? Improving Deep Neural Networks: Hyperparameter tun	7	643	September 10, 2021
Why Regularization Reduces Over Fitting Lecture Improving Deep Neural Networks: Hyperparameter tun	1	496	April 9, 2022

C2_W1_early stopping/orthogonalization

Related topics