Advantages of using Pix2Pix GAN over traditional supervised learning for image translation

I’m wondering if we can use traditional supervised learning for image translation (e.g. satellite image → map routes) . For example, with the same pair-wise training data, we could use U Net to predict for each pixel’s class such as “river”, “road”, or “building”, and then use corresponding coloring to generate the map routes.

If this works theoretically, what would be the advantages/disadvantages of using a GAN instead? Would GAN require less data?

Also, roughly how much data do we need for a reasonably good result for Pix2Pix GAN? 500? 1,000? 10,000?

Hi Mengying!

Yes, it’s possible to use traditional supervised learning for image translation, provided you have sufficient pairs of images. But I’m not sure how good it will be. Will it converge? Will it be able to generalize well? These can only be answered by experiments. But I’m quite sure that GANs will outperform them due to the nature of their architectures and the nature of their training, they are more robust. And the results can be more diverse, and realistic in the case of GANs. Also, one might need to investigate what details/features, the architectures consider while giving out the translated images. It is quite complicated to think about, but the former approach might not be that efficient considering that we are segmenting the images and then we are mapping the segmented images to map routes. It will be interesting to see what feature it will consider while learning to translate the segmented image.

And about the data, GANs do require more data to come up well but I think given X number of examples, GANs will perform better than the former approach given the complexity of the task. Yes, GANs might require lesser data comparatively to outperform the former approach but at the baseline, everything requires a good amount of data.

And regarding the number of samples required for Pix2Pix, it will be dependent on your problem. For example, if you have very complex translations like converting some kinds of tissue images to some other kind might require more data than Satellite images → Map routes. But in general, it is good to have more data but we need to ensure the quality of data too which is more important.


Thanks for the explanation! GAN seems to be more powerful and robust than I imagined!

Re the data size, I saw in the programming assignment that the loaded data contains only about ~2000 pairs. Is that the same set of training data that the pretrained model was trained on? It will be quite surprising, because 2000 seems a small number for a deep learning model.

No, it need not be the case where the pretrained model was trained with only ~2000 pairs. It would have been more. Also, In our assignments, in general, we can observe good results for even smaller datasets, but this is so because we would be only evaluating the performance w.r.t training data. It is easier for a model to perform well on training data when run for a larger number of epochs even though the dataset is smaller. But it might fail to generalize, we rarely test them on our test datasets in our assignments.

Great. Thanks for the swift answer!

1 Like