I have got question regarding Week 1 - Regularisation - Early Stopping video at around 4:30
There is chart showing training error decreases while we are running additional iterations. I dont understand why dev set error starts to increase. Actually I can misunderstund whole concept, I thought that:
1 I’m using training set to train my network - in each iteration I’m changing W and b values.
2 After I train my model I have got fixed W and b and what I’m doing, I’m tweaking hyperparameters. I’m not runing multiple iterations with different W and b, but I’m checking results for different hyperparameters while keeping W and b constant.
Think about what happened if the hyperparameter you are tweaking is the number of layers or the number of neurons, then you necessarily need to train again since you are creating connections that weren’t there before or removing some of them.
If you change for example the learning rate, then you need to train again, because this hyperparameter is tied to the training process, otherwise what is the point of changing the learning rate if you are not going to train with it.
And so on, the point I’m trying to make is that the hyperparameters change your model, so you need to train again to check wether it is improving or not.
Regarding the dev error increasing, that is a sign that the model is overfitting, because your model is adapting too much to your training set, which leads to a poor generalization, hence the bad behavior with the dev set.
Thanks for your answer. I still don’t exactly understand why I should use a training set if I’m training networks from the beginning on the dev set? What data (parameters, settings, setups?) I’m transferring from training to dev set?
So I train on a train set, check on a dev set, and train on a train set after changing something?
I still don’t understand the chart with iterations. If I’m only evaluating my model on the dev set, I would feed my NN with inputs, run forward propagation, and check error (Y hat vs. true Y). There will be no multiple iterations - just a single iteration.
My understanding is that a single iteration is forward propagation + back propagation. Multiple iterations are run in the training phase (each iteration is updating my W and b).
So I don’t understand iterations in the dev set since we are not training in a dev set.
Yes, you can evaluate the dev set after each training iteration with the train set, to see how things are going, so you can have two measurements to compare, one for train and one for dev.
Thanks @nqhoang, and how this PSG thing is going? .