What is the value of the generalization error estimate using the "test" set of data?

When choosing a NN architecture, I understand Andrew saying to run the data through each of 3 the models to get the weights for each potential model. Then run the CV data to see which model gives the lowest Loss(J).

Once you have a model chosen, whats the reasoning or value of estimating the generalization error using a test set?

Also, is there any role of randomizing the data several times and going through the above process with different sets of training, CV and test data to confirm which model is optimal?

The reason the test set is used is to get an unbiased estimate. After selecting your best performing model, you get the unbiased performance by using the test set, which is unseen by all the models, including the best performing model.
If you do not really care about getting an unbiased estimate, you could just train using the test sets and test all the models using the development test and just go with the best without further testing.