Early stopping - why dev set error starts to increase?

I have got question regarding Week 1 - Regularisation - Early Stopping video at around 4:30

There is chart showing training error decreases while we are running additional iterations. I dont understand why dev set error starts to increase. Actually I can misunderstund whole concept, I thought that:

1 I’m using training set to train my network - in each iteration I’m changing W and b values.
2 After I train my model I have got fixed W and b and what I’m doing, I’m tweaking hyperparameters. I’m not runing multiple iterations with different W and b, but I’m checking results for different hyperparameters while keeping W and b constant.

Is it correct?

Hi @Szymon.P.Marciniak,

Think about what happened if the hyperparameter you are tweaking is the number of layers or the number of neurons, then you necessarily need to train again since you are creating connections that weren’t there before or removing some of them.

If you change for example the learning rate, then you need to train again, because this hyperparameter is tied to the training process, otherwise what is the point of changing the learning rate if you are not going to train with it.

And so on, the point I’m trying to make is that the hyperparameters change your model, so you need to train again to check wether it is improving or not.

Regarding the dev error increasing, that is a sign that the model is overfitting, because your model is adapting too much to your training set, which leads to a poor generalization, hence the bad behavior with the dev set.


Thanks for your answer. I still don’t exactly understand why I should use a training set if I’m training networks from the beginning on the dev set? What data (parameters, settings, setups?) I’m transferring from training to dev set?

You don’t train on the dev set, this is used to evaluate your model. You always train in the train set.

So I train on a train set, check on a dev set, and train on a train set after changing something?

I still don’t understand the chart with iterations. If I’m only evaluating my model on the dev set, I would feed my NN with inputs, run forward propagation, and check error (Y hat vs. true Y). There will be no multiple iterations - just a single iteration.

My understanding is that a single iteration is forward propagation + back propagation. Multiple iterations are run in the training phase (each iteration is updating my W and b).
So I don’t understand iterations in the dev set since we are not training in a dev set.

Actually, there is an evaluation step after the end of each iteration in which the model try fitting the validation input too.


Yes, you can evaluate the dev set after each training iteration with the train set, to see how things are going, so you can have two measurements to compare, one for train and one for dev.

Thanks @nqhoang, and how this PSG thing is going? :stuck_out_tongue_winking_eye:.

Pretty good here.
Getting along with my oldie Neymar and mbappe boi.