Professor has recommended an option regarding train/dev sets.
You train our your training set, try different model architecture and evaluate on dev set. We use that iterate to get a good model.
Does he mean to train the same data set with different models and choose 1 best model?
I’m actually confused; How we train on one model and try another architecture for dev? Since, when we train one model, we will have trained parameters for that architecture and can those params be used for another architecture? How is it possible?
He probably wants to say you will train different models on the same dataset and test in on the dev set and move forward with optimization of the hyper-params with the model that performs best.
1 Like
Yes, it as Lucian says: the point is that you make a set of choices for all your “hyperparameters”, then you train on the training set. Then you use that model to compute the predictions on the dev set. If the results are not good (underfitting, overfitting), then you adjust your hyperparameters and try the whole cycle again. “Rinse and repeat”. 
Then once you have good performance on the train and dev sets, then and only then you evaluate that model on the test set. The point is that the test set can’t be used in any of the previous training so that you get a more accurate view of how your trained model will perform on “real world” inputs that it was not trained on (i.e. “has never seen before”).
1 Like