I have a question about conditional GANs. Why did conditional GANs perform so well (generate better quality) despite the absence of any improvement, such as WGAN-GP and SN-DCGAN? Are those problems (mode collapse and vanishing gradient) coming other than MNIST, such as high-dimensional color images? I can’t tell where and when I might encounter a problem because programming assignments are less likely to encounter those issues on the MNIST dataset.
Good question, @Shawn_Frost! Sorry for the delay answering.
First, one thing to remember with SN (Spectral Normalization) and WGAN is that the main goal with these is to improve stabilization - e.g. avoiding mode collapse. Beyond this, they’re not really about improving the quality of the results.
But, that just explains why their results are no better than the conditional GANs. It doesn’t explain why they are worse than our conditional GANs result. I think the reason the conditional GAN is getting a nicer answer sooner is just that it is solving a simpler problem. For example, the generator might say, “This image I generated is a 9”, and the discriminator has to decide yes-or-no if it is an image of a nine. That’s simpler than the generator saying, “This image is a digit” and the discriminator deciding if it is any one of the 10 digits.
The model for the conditional GAN is basically identical to the code we used in Week 2 for the DC GAN. The only difference is that we are adding a one-hot vector to the input for the generator and the discriminator. If you look at the results for that DC GAN, you’ll see it’s comparable to the WGAN and SN GAN.
But I am not sure whether I am thinking in the right direction. In Conditional DC-GANs, we fed input as noise vector and one_hot_vector to a generator whereas the discriminator takes real and image_one_hot_labels. The generator generates an image of that particular class and the discriminator also generates the image for the same class. I am assuming both received the same label as input which would be simple for the generator to generate images over certain epochs because it received feedback for the same class otherwise it would be fake no matter how better the quality is.
In DC-GANs, Generator synthesizes a random digit and asks the discriminator whether it is a digit from one of the nine digits (0 - 9). Then, the generator receives feedback from the discriminator. However, the randomization process involves the generator and discriminator conversation without certain labels over the epochs. The generator is hard to generate images even though received feedback from the discriminator in 50 epochs but it may generate better quality in longer epochs. It is the same for W-GAN_GP and SN-DCGANs.
Is that right or you can add some explanation that would help me in thinking the right way?
Exactly, @Shawn_Frost! Except for one point, where you wrote that for conditional DC-GANs, “the discriminator also generates an image for the same class.” The discriminator’s output is a prediction for how likely it is that the input image is an image of the specified input class. I may just be mis-interpreting what you meant by this sentence. Regardless, the main idea is exactly right.