When the video started, I wanted to pause for a second and think about the one image issue for our face verification task.
My first intuition was to perform image augmentation. Is that a viable solution? Why did we decide to leverage the degree of difference instead?
You can always adopt Image Augmentation, but let’s say you adopt conventional augmentation techniques such as flip, rotate, etc, how many images do you think will you be able to create having a certain difference, so that the network can really learn something different, because you only have a single image to begin with?
Additionally, I don’t think GAN-based augmentation would be viable here, since one image won’t be enough to train or even fine-tune a GAN. You can always use a pre-trained GAN, but I am not sure it would work, since the GAN must produce images “related” to the image in question. Even if you use other generative models, which may be viable, once again, how many images will you be able to create?
Even if we somehow create 100 images from image augmentation, most of them will be similar to each other or to the original image, so it doesn’t make much sense to train a network using those 100 (possibly similar) images. That’s why we have adopted one-shot learning, where as you said we
However, this is an open-area problem. If you think you can somehow use image augmentation in the face verification task (and some of the researchers might have tried it too), you are most welcome to try and share your work with the research community. I hope this helps.