When choosing a NN architecture, I understand Andrew saying to run the data through each of 3 the models to get the weights for each potential model. Then run the CV data to see which model gives the lowest Loss(J).

Once you have a model chosen, whats the reasoning or value of estimating the generalization error using a test set?

Also, is there any role of randomizing the data several times and going through the above process with different sets of training, CV and test data to confirm which model is optimal?