I was wondering why in the beginning of the SN-GAN notebook there is an explanation of how spectral normalization can be helpful to satisfy 1-L continuity requirement for W-GAN, but later on we proceed with training DCGAN (with BCE) instead?

Would it still make sense to use SN with BCE? To my understanding the vanishing gradient problem can not fade away because of the loss function even with such a weight normalization technique. Could you please correct me if I’m wrong?

Hey @sohonjit.ghosh
I don’t know if learners are supposed to tag mentors directly. So I’m sorry if we’re not.
But you helped before so could you please take a look at my question again? It really interests me

First thing first, you can tag any mentor to help you and I felt really amazing that you tagged me. Sorry for the delay though.

Now, spectral norm is a regularization technique and not a loss function. You can see in the code that the discriminator block’s Convolutional layers are wrapped in “nn.utils.spectral_norm” function such that the weights are clipped to 1.

Since you thought about the BCE loss being used I want to say that even though BCE loss even though has lots of disadvantages its usage on a practical level is way bigger. Its simple and even though there are chances of mode collapse, in most practical cases there ain’t any necessity of taking other loss functions. You will see in the later course material that is the case.One thing you can always do is learn all the loss functions (just see) and use them alternatively in your projects. You can select the one which provides the better result. I myself consider the loss function selection as an “hyperparameter” when doing any GAN related work.

Thanks for the answer
Yeah, I understand that spectral norm is a regularization technique and not a loss function.
But it’s also not the spectral norm itself that is supposed to help with the vanishing gradients and mode collapse problems, right? It’s Earth Mover’s distance rather. And one way to make sure that W-loss is valid is to encourage the model to satisfy L-1 continuity, right? That is where the spectral norm comes into play.
But what is the point of using spectral norm with BCE (as in the optional assignment notebook happens), given that there is no requirement we have to meet with this loss (meaning BCE)?

See what happens normally is as the GANs proceeds they produce fake images. But, these fake image distributions are somewhat disjoint from the real image distribution. The 1-L Continuity acts as a connector. For WGAN, inducing the 1-L continuity makes the Earth Movers’ distance continuous and differentiable. But that doesn’t mean we can’t make use of the 1-L continuity with other cases of GANs, i.e. with BCE Loss. Spectral norm is like a bulkier brother of 1-L continuity which adds in that connectivity between the two disjointed distributions. Hence, used in conjunction with BCE Loss.

That is exactly what I was asking. Thanks
By the way, I just compared two models trained with BCE loss. One with spectral norm and the other without it.
It turned out that in my case, spectral norm helped the model to git rid of some low level artifacts inherent to transposed-convolution-based generators

Its pretty amazing that you compared and found the differences between the functions. Yes, you would see some changes for sure, but, as you go further in the specialization that the differences are minimal.

I would request you to share the github link of your working if you can for others to see. You will get more exposure with this and other learners would also be encouraged to do more.

You are doing great and you are asking the best of questions. I am really glad that you have taken this with greater dedication. I was also elated that I was of some help to you.