Downside of Other Regularization Technique - Early Stopping

Well_Zhang · June 4, 2023, 4:49am

Andrew mentions one of the downside of early stopping is it couples the two tasks, optimization and regularization, so that it cannot complete each of them well simultaneously.

The argument actually makes sense. But I’m curious that, as overfitting is measured by the difference between the dev error and train error, if we find the smallest difference between two errors utilizing early stopping, isn’t that exactly what we want to achieve?

paulinpaloalto · June 4, 2023, 4:57pm

It’s an interesting set of issues and there aren’t “silver bullet” or “one size fits all” answers. I think Prof Ng’s point here is that if you are trying to tune your regularization, then early stopping is manipulating a different hyperparameter. In other words, the number of iterations and the \lambda value (assuming L2 regularization) are separate hyperparameters, but they would no longer be orthogonal with Early Stopping. Note that convergence is not guaranteed to be monotonic, so there might be an even better solution further out in terms of number of iterations that you might miss. In other words, things can diverge for a bit and then converge to an even better solution. The shapes of the cost surfaces here are pretty complex. We will learn more sophisticated techniques like Adam, RMS and dynamic learning rates later in this week.

Well_Zhang · June 4, 2023, 5:25pm

I see the point now, thank you.

Topic		Replies	Views
C2_W1_early stopping/orthogonalization Improving Deep Neural Networks: Hyperparameter tun	3	510	September 5, 2022
Why Regularization Reduces Over Fitting Lecture Improving Deep Neural Networks: Hyperparameter tun	1	496	April 9, 2022
Course 3 Why don't you use early stopping? Structuring Machine Learning Projects	3	719	May 31, 2021
Partially train NN then use drop out regularization? Improving Deep Neural Networks: Hyperparameter tun	6	370	October 19, 2023
Regularization, lambda/m Improving Deep Neural Networks: Hyperparameter tun	4	561	December 21, 2021

Downside of Other Regularization Technique - Early Stopping

Related topics