It can happen but probably that validation data is not representative of the dataset well enough, maybe if you use cross-validation with folds then it might give you a better overview.
Folds means; let’s say divide the dataset to 5 sets (folds) and each time use one fold for validation and 4 for training, but we go 5 rounds of training -validation and model is validated on the entire dataset so we get a better overview on model performance. Check Cross-validation
This is a way to do the split between training and cv data that ensures they are from the same statistical distribution. In the basic case you posit here, the most likely explanation is that the cv data is somehow “easier” than the training data. There are lots of complex scenarios where this can happen. Prof Ng spends quite a bit of time on this sort of issue in DLS Course 3. If you haven’t been through that yet, it is really worth your time. Or if you haven’t looked at it in a while, scan through the titles of the lectures and you’ll see some that should sound like they address this type of issue.
Note that the specific technique of folds that Gent refers to is not directly covered anywhere in DLS by that name anyway. I first heard the term just in the last week on a discussion thread here, which is worth a look: K-fold cross validation - #2 by paulinpaloalto
Hi Paulin, your advice is always superb no doubt. Its been a while I took DLS and I will check it as well, definetely there should be better techniques than cross validation with folds because cv with folds is computationally expensive and suitable only for small datasets. It just came into my mind because I have been reading about it recently. I was going through the MLOPS course 2 some time ago and they use tensorflow data validation to build a schema for training which is then compared to the validation data schema. If the schemas are not similar then work on features is done (all sorts) or maybe another shuffled partition is taken and checked again and worked on it again so ultimately the schemas are similar before progressing further with training.