If I combine a relatively small subset of the train set and the dev set. Is it always true that the combined set would have the same distribution of the train set?
Any example would be very appreciated. Thank you
No.
but train-dev is not a combination of training and dev. it is only a subset of training that is not used for training. so we can be sure that the comparison of train-dev accuracy with training data accuracy is free of data miss-match risk and can be a good measure of how much variance is there.
1 Like