In the “Model selection and training/cross validation/test sets” video, Andrew says we should use the parameters of a given model architecture to calculate the cross validation error for each architecture. For example, w<1>, b<1> parameters refer to the first model architecture, but which unit/layers do w<1>, b<1> refer to? Do they refer to the last unit in the last layer, and then we calculate the error of that unit using cross validation set?
Another question - depending on cross validation error, we would then use the test set to calculate the error of the test set. How do we know if the test error is a good value? When calculating the cross validation error, we compare it against different architectures, but we don’t have anything to compare the test error with, except the cross validation error. Should the value of the test error be similar to the cross validation error?
What’s the difference between cost and a generalization error?
In this context I thin he is referring to all weights and biases collectively in the entire model!
The test error which is the error on data that the model has not seen should be as high as possible or at the desired level, if its ot then your model/parameters need to change. Ideally close to the cross-validation error!
In this context, I think one refers to a real-life scenario test -generalization (on data the model has not seen before), and cost refers to either training or validation, that’s the only difference.