I followed the SN-GAN notebook as it is without any edits, and it appears that the model took way too long to train, longer than a normal DCGAN. Why is that?
Hi Mohammed!
Welcome to the community ![]()
Well, it completely depends on the architecture, like how deep it is, what additional methods are involved, what is the loss function and what goes inside the computational graph etc, so it is ok to have longer training time as one cannot exactly say why something takes more time but typically it might be deeper, might have more parameters for back prop than what you have tried with general DCGANs.
Regards,
Nithin
@Mohammed_Ali4,
If you are asking why SN-DCGAN is slower than DCGAN with the exact same model, where the only difference is that the discriminator for SN-DCGAN wraps each call to nn.Conv2d with a call to nn.utils.spectral_norm, like this:
nn.utils.spectral_norm(nn.Conv2d(input_channels, output_channels, kernel_size, stride)),
The short answer is that the extra work to do the spectral normalization takes time. The point of the SN is not to improve how long it takes to train, but to stabilize the training of the discriminator.
The “Spectral Norm” section in the notebook gives some detail on what’s involved in calculating the spectral norm to give you a little sense of the extra work it’s doing. The section “DCGAN Discriminator” mentions that you can use Pytorch’s nn.utils.remove_spectral_norm during inference (i.e. when using the model to make predictions after training) to improve runtime speed. The fact that the Pytorch implementors provide this function is a good hint that they know that calculating spectral norm can be time-consuming.
All of that said, for me, I did not see a huge difference in time for the particular case of the SN-GAN notebook. When I ran it with the nn.utils.remove_spectral_norm wrapping the nn.Conv2d it took about 14 min. and it took about 11 min without it (very roughly). If you were seeing much bigger differences between what you saw with regular DCGAN vs SN-DCGAN, it’s probably because there was some other environmental issue going on. For example, if Coursera happened to be low on resources at the time you ran your SN-DCGAN, you might have seen a downgrade in your speed due to swapping of resources between you and other users.
I might have framed it incorrectly, I didn’t mean the time per epoch due to computation, I meant time to learn to produce reasonable results. Without SN, the GAN produced reasonable results in 3000 epochs, but for SN-GAN, only after 10000 epochs, the model started producing anything but grey noise. If it improves stability why did it take so long to stabilize training?
Ah, I see, @Mohammed_Ali4. Good question.
So, by more stable, they mean avoiding vanishing gradient problems like mode collapse. In this case, by using spectral normalization to try to maintain 1-Lipschitz continuity, which means keeping the gradients small. Basically, no big gradient steps. You can think of it as going more slowly and cautiously to avoid falling into the big hole of mode collapse.
I have a follow-up to this, using SN, can one use the W-Loss without the gradient penalty?
@Mohammed_Ali4, I think you’re starting to go beyond what I’ve thought through.
From my understanding, the SNGAN lab is showing SNGAN as an alternative to WGAN. SNGAN uses spectral normalization on the Conv2d layers in the discriminator’s model, but it calculates loss using the same BCE loss we use with DCGAN, while WGAN calculates loss using the W-loss formula discussed in the lecture (taking the difference between the fake and real predictions and adding in a gradient penalty factor).
I think what you’re asking about is trying to use spectral normalization in conjunction with W-loss. I haven’t really thought about that, but a little googling pointed me at this article which seems to talk about combining the two: A Comparison of WGAN Implementations (WGAN-GP and WGAN-SN) | by Brad Brown | Medium
I hope it’s helpful