In video “Exploring augmentation with horses vs. humans”, the instructor said: you don’t just need a broad set of images for training, you also need them for testing or the image augmentation won’t help you very much.
I wonder is it useful to augment images in the validation set? Why we do not do this in the lab?
The term “broad” here doesnt mean augmented images, broad idicates a wide variety of samples from all the classes. Augmentatuion is used to make your dataset bigger by simple image processing operations like rotation, clipping, cropping, etc. Having these in your validation or test test is just an additional task and adds to your computation, without contributing much to your model performance. This is the reason data augmentation is just used in the training set.
My understanding is that augmentation is used when one doesn’t have data sufficient to train a model for what it will encounter in the ‘real world’. You augment to make the training data more congruent with the true, operational environment. But what would be the point of training on an augmented data set if you don’t validate and test against a similar distribution? When deployed, the model will encounter all the diversity of the operational environment but you will have no way of knowing in advance whether it is likely to perform well or not. Isn’t the ideal case such that training, validation, test, and operational all have similar content? I am struggling to see how augmentation of training only is the best approach. What am I missing?
I have answered your question in parts. Please refer to the italian font for my answers. Let me know if you agree or parts of it is still unclear.
My understanding is that augmentation is used when one doesn’t have data sufficient to train a model for what it will encounter in the ‘real world’. You augment to make the training data more congruent with the true, operational environment.
I agree. You could say that.
But what would be the point of training on an augmented data set if you don’t validate and test against a similar distribution?
Augmented data does not in practice change the distrubution of the training data. The training distribution after augmentation still retains the origingal properties as before(mean, variance etc.), augmentation just helps us sample more examples and transform it from the initial training distribution. When the training distribution is different and the test distribution is different we are dealing with a more complicated problem altogether.
When deployed, the model will encounter all the diversity of the operational environment but you will have no way of knowing in advance whether it is likely to perform well or not. Isn’t the ideal case such that training, validation, test, and operational all have similar content? I am struggling to see how augmentation of training only is the best approach. What am I missing?
*By training with augmentated images the model is already aware of features that has been scattered through out the image space. When deployed you still do some basic processing like resizing or grayscale in some contexts and then feed it to the model. It is also not a best practise to have “your own generated data (augmented)” in the test set as it could not be a appropriate depiction of the reality. A cat or dag could be hanging upside down (I am not a manic! just for the sake of the explanantion. ) after augmentation, but there is a chance that you could never get this in reality or as an input to your model. You will loose a lot on accuracy with this robust testing. *
Thanks for the reply. Definitely food for thought. I’m still struggling with this concept. If you don’t want it in your test set because it might not be an appropriate depiction of reality, why would you want it in your training set? I’ll do some reading and share if I find something worthy.
You want it in the training set because you have less data, it helps you with overfitting and makes your model robust with respect to the features being learned by the model. This is a good thing to have when you have a relatively smaller dataset. Sure!!
But if the validation set is not augmented as training, will the validation accuracy higher than what it should be? If we want to stop training when the val_acc metric reaches a certain threshold, will the performance of the trained model be worse than expected?
I tried W4 assignment (Exercise 4 - Multi-class classifier) with and without augmenting the validation set, trained 20 epochs:
Without augmenting the validation set, the loss and accuracy are much better than that of the train set:
After augmenting the validation set in the same way as training set, the loss and accuracy are quite close to that of the train set:
So I think augmenting validation set is useful. I am not sure if we want to stop training by checking the val accuracy, because it is not a smooth curve. Maybe checking both can help? Anyway, if asymmetric argumentation produces misleading results, it is meaningless to check the validation accuracy at all.
The validation set is a part of the training dataset which the model has not seen, validation accuracy is often used to see how your model is generalising the samples, meaning you are making sure that the model does not “very particularly” learn to classify images as in the training set, which would be overfiiting beacuse the images in the validation and the test are different than the training set. Often good validatuion accuracy results in better model performance with or without a small compromise on the training accuracy.
Did you use the same settings as in the assignment for validation or did you have custom validation parameters?
This is a very well defined Computer Graphics dataset, so i am not surprised seeing your plots. However, there is no signoficant difference in both the plots. Can you reproduce the same results with cats and dogs? What was the test accuracy that you achieved in both the models?
It is just additional work. Augmenting validation set will not help in giving you a better model performance. As you can see the validation plot is noisy as you have augmented the images which means you are more roubustly testing your model performance. Too much robustness will hurt your model performance. Unseen images should be enough.
What test accuracy did you get for both the models?
It is alwas wise to stop training when you have reached a certain threshold on training accuracy, training takes up most of your time and computational resources.
The validation accuracy is an indicator of how well your model is generalising, without valuidation accuracy it is very difficult to know if you are overfitting or underfitting.