Question to loss >>> cv_loss


As we know that when the loss and cv_loss are very closer to each other, it is good, to overcome the overfitting.

When loss <<< cv_loss, it is overfitting, we have solution like L2_reg, dropout etc…

However, my question is how about loss >>> cv_loss and the gap is relatively large, then what kind of solution shall we have through?


You mean the training loss is much bigger than the validation loss? Then this is the case of overfitting.

take a look here, the validation loss goes blow and smaller than the training one.

WDYT @gent.spah ?

The model performs better on CV than on the training set.

It can happen but probably that validation data is not representative of the dataset well enough, maybe if you use cross-validation with folds then it might give you a better overview.

what do you mean with folds ? :stuck_out_tongue_closed_eyes:.

Folds means; let’s say divide the dataset to 5 sets (folds) and each time use one fold for validation and 4 for training, but we go 5 rounds of training -validation and model is validated on the entire dataset so we get a better overview on model performance. Check Cross-validation

This is a way to do the split between training and cv data that ensures they are from the same statistical distribution. In the basic case you posit here, the most likely explanation is that the cv data is somehow “easier” than the training data. There are lots of complex scenarios where this can happen. Prof Ng spends quite a bit of time on this sort of issue in DLS Course 3. If you haven’t been through that yet, it is really worth your time. Or if you haven’t looked at it in a while, scan through the titles of the lectures and you’ll see some that should sound like they address this type of issue.


Note that the specific technique of folds that Gent refers to is not directly covered anywhere in DLS by that name anyway. I first heard the term just in the last week on a discussion thread here, which is worth a look: K-fold cross validation - #2 by paulinpaloalto

dude, correct me if I am wrong.
The reason is bc of Data-Mismatching, right ?

Yes, that lecture is a great place to dig deeper on this situation.

Hi Paulin, your advice is always superb no doubt. Its been a while I took DLS and I will check it as well, definetely there should be better techniques than cross validation with folds because cv with folds is computationally expensive and suitable only for small datasets. It just came into my mind because I have been reading about it recently. I was going through the MLOPS course 2 some time ago and they use tensorflow data validation to build a schema for training which is then compared to the validation data schema. If the schemas are not similar then work on features is done (all sorts) or maybe another shuffled partition is taken and checked again and worked on it again so ultimately the schemas are similar before progressing further with training.