Why train-dev set has the same distribution of train set and not of the dev set?

If I combine a relatively small subset of the train set and the dev set. Is it always true that the combined set would have the same distribution of the train set?
Any example would be very appreciated. Thank you


but train-dev is not a combination of training and dev. it is only a subset of training that is not used for training. so we can be sure that the comparison of train-dev accuracy with training data accuracy is free of data miss-match risk and can be a good measure of how much variance is there.

1 Like