Effect of image augmentation on model performance

I’ll briefly state the problem first, followed by a detailed description of the situation.

I applied 3 augmentations on the inputs images to my CNN model - random rotation within the range of 30 degrees, vertical flipping, and random brightness. I’ve used tensorflow framework and ImagaDataGenerator. I carried out 3 iterations. In the first one, I applied rotation and flipping, in the second one I applied only brightness, and in the third one I applied all the 3 techniques. The result was surprising - the AUC score for all the classes (since the problem is a multi-class, multi-label classification) was worst in the 3rd iteration.

Description of the model and the dataset:- training set contains roughly 500 thousand images of chest x-ray corresponding to 15 different lung diseases, and each image can have one or more of these diseases. So this is a multi-label, multi-class image classification problem. I’ve picked EfficientNetB4 as the convolutional base. The dataset is not balanced so I’ve used weighted loss as the loss function, giving more weights to the classes with less number of positive samples in the dataset.

Kindly share your insights as to why the model is performing poorly when I apply the 3 data augmentation techniques together, when ideally it should give better results!

What led you to apply these three augmentation techniques ? Did you diagnose the model first ?

The idea to apply these augmentation mainly stems from the insights gained from the dataset. Some images are flipped, some are rotated, and there is variation in the brightness and contrast values.

Perhaps the level of augmentation applied in the third iteration could have been excessive or not applied correctly.

It might be. The level of augmentation applied might have caused the issue. But another question that strikes me is whether applying more and more augmentation techniques would necessarily improve the model performance. More augmentations could have reduced the model’s capability to learn correct features.

No, It will negatively impact the performance as your model may become more reliant on synthetic features and learn to disregard the original features.


Thanks! I’ll update if I get any further insights on further iterations.

My thought is you should slightly reframe this thought. You don’t tell us whether this performance degradation is on training, validation or test data. I infer it could be the latter, which suggests to me that the problem is not that the model is inadequately learning correct features. In fact, it is learning them too well. The problem is that the features of the training data have themselves become incorrect, and have diverged too far from the operational reality.

By the way, I recall that at one time you were considering applying transfer learning to a model pretrained on ImageNet. Did you go with that? Is this current thread just on the transfer learning portion? Or you are training from scratch?

ps: I also notice that all of your posts are in the AI in the Real World category. You might consider cross-posting/linking to the AI for Diagnosis forum. I know there are MDs and healthcare professionals over there …they might have some interest and advice for your approach. HTH