In the first video, Training/Dev/Test sets, Prof Ng says (6:09) "the goal of the dev set or the development set is that you’re going to test different algorithms on it and see which algorithm works better.
So, does he mean that we try different hyperparameters such as different numbers of layers, different numbers of units in a layer, learning rate, etc. on the dev set, NOT training set? If so, what role does the training set play? It is confusing to me.
Thank you for pointing to the previous posts for me. So, it is actually that we try different hyperparameters (e.g., number of layers, units in a layer) on training set but use the dev set to evaluate the performance of different architecture. Is my understanding correct?
Though I’ve read the previous discussions, can I confirm one thing? When we evaluate the performance of trained models using dev set, do we run forward propagation only one time (no back propagation) to produce predictive outcome?
I am also confused on this. I thought that we do not do backpropagation on Dev set. Then how can we somehow overfit to dev set? I think Andrew mentioned in the lecture that if there is big gap between test error and dev error, we might overfit to dev error and we might want to find larger dev set. Can anyone help to clarify a bit? Thanks