In the network that we implemented in the assignment, data augmentation is achieved by two layers - random flip and rotation. I understand how that means that each image is flipped and rotated before being passed to the following layers. However, I do not see how we actually get more images. That is, I see that the images are transformed, but do not see how they are augmented.
The dataset is augmented in the sense that there are more images, right? The rotated and flipped images are different. They were not in the original dataset.
My question is, where does the code say that the new images are augmented rather than replacing the original images?
That is a great question indeed.
I think it is important to note that the new images are generated during the training phase are fed into the NN as well as the original image.
If not please feel free to ask more!
Where do I see this in the code? All I see is that some transformations are applied to an image. In particular, I don’t see a copy of the original image being made before the transformation is applied… Neither do I see that more than one augmented image is generated from any given original image.
The following is an excellent tutorial on data augmentation layers in TensorFlow:
Note the process of preparing the dataset. If the intention is training, preparation will include data augmentation layers, and will enriched the dataset with augmented samples. For testing, dataset is not augmented. By this approach you should have a clear distinction between datasets that include or do not include additional (fake, i.e. augmented) samples
The layers included in the model through data_augmenter() are RandomFlip and RandomRotation. The ‘Random’ indicates that each pass through the network will create randomly augmented images used to train the model. So, there will (randomly) be different images with each pass through the network. No copies are made of the original images; something akin to the original images may result from the RandomFlip and RandomRotation.
Note also that the documentation states “During inference time, the output will be identical to input.” So during inference, there will be no flip or rotation (with the default training=False).
Did you mean to write that the images will be different with each pass through the Training Set rather than through the network? So, if we perform 10 epochs of training, then we will train on 10 random variations of each image?
Each time the training set passes through the network, each of the pictures in the training set will be augmented randomly. This implies that every epoch trains on (randomly) slightly different images and no new images are added to the training set. See, e.g., this discussion.