hello everyone
when prof andrew talked about disadvantages of early stopping , he said early stopping limit the search space of hyperparameters since we don’t optimize the cost function J and thus we can’t pick the best model among those we train but i think we can apply early stopping to each trained model and choose the one that do the best job when we stop gradient descent isn’t this a good aproach??
Yeah, you could choose the best out of them but the point here is that if you continued training you might have found a much better optima than what you already have.
but the purpose of early stopping is to stop training when the dev set error start increasing even though there is a better optima in the training set error and because the model start overfit the data after the point of early stopping we can’t
find a much better optima in the dev set . And so the point here is that instead of keep training our models and then choose the one that minimize J(train) and then take this same model and try to fix overfitting we can simply do both tasks more efficiently by choosing the model that minimize this time not J(train) but J(test) in other words we choose the model that do the best work without overfitting the data . and here i don’t know why this is a problem to perform both tasks at the same time
I think Prof Ng’s overall point here is that “early stopping” is too simplistic a strategy: it’s only part of the solution. The point is what do you do if the results at the early stopping point still are not good enough to satisfy the requirements of your system? Or to put it another way: overfitting is just one problem you may have to deal with in order to get to a good enough solution. Stopping early is one way to approach that, but you may find cases in which that is not sufficient.
The general message in all this section of the course is that there is no one “silver bullet” magic recipe that will solve all your problems in all cases. You need to have a repertoire of approaches and have the experience and judgement to pick the one(s) that are appropriate for the problem at hand. It’s that kind of experience that Prof Ng is trying to impart to us here.