Since parameters w and b wasn’t trained using test data. Then how come we can call them overly optimistic when evaluating a cost function out using test data.
The test data shares similar patterns, distributions, and features with the training data, even though it wasn’t used for training. This similarity can cause the model to perform better on the test data than it might on entirely new, unseen data (overly optimistic estimate of its generalization capability).
Hope this helps! Feel free to ask if you need further assistance.
As per the proposed solution we need to use Cross Validation Set
which is also bifurcated from the available data. Such as 60% training set, 20% cross validation set, 20% test data.
So CV set might contain the similar patterns, distributions, and features. How come its distinguished from test set?
CV is used to tune hyperparameters and adjust the model properties. However, the Test Set is reserved as a completely unseen dataset that is used to evaluate the final model’s performance after the hyperparameters have been tuned. This step evaluates the model’s ability to generalize.
Statistically all three subsets should be very similar.
The only differences are in how they are used.