SN-GAN optional assignment

IvanK · August 30, 2021, 6:57pm

Hello

I was wondering why in the beginning of the SN-GAN notebook there is an explanation of how spectral normalization can be helpful to satisfy 1-L continuity requirement for W-GAN, but later on we proceed with training DCGAN (with BCE) instead?

Would it still make sense to use SN with BCE? To my understanding the vanishing gradient problem can not fade away because of the loss function even with such a weight normalization technique. Could you please correct me if I’m wrong?

IvanK · August 31, 2021, 11:30am

Hey @sohonjit.ghosh
I don’t know if learners are supposed to tag mentors directly. So I’m sorry if we’re not.
But you helped before so could you please take a look at my question again? It really interests me

sohonjit.ghosh · August 31, 2021, 1:50pm

Hye @IvanK ,

First thing first, you can tag any mentor to help you and I felt really amazing that you tagged me. Sorry for the delay though.

Now, spectral norm is a regularization technique and not a loss function. You can see in the code that the discriminator block’s Convolutional layers are wrapped in “nn.utils.spectral_norm” function such that the weights are clipped to 1.

Since you thought about the BCE loss being used I want to say that even though BCE loss even though has lots of disadvantages its usage on a practical level is way bigger. Its simple and even though there are chances of mode collapse, in most practical cases there ain’t any necessity of taking other loss functions. You will see in the later course material that is the case.One thing you can always do is learn all the loss functions (just see) and use them alternatively in your projects. You can select the one which provides the better result. I myself consider the loss function selection as an “hyperparameter” when doing any GAN related work.

I hope I could help.

Best
Arijit

IvanK · August 31, 2021, 3:06pm

Thanks for the answer
Yeah, I understand that spectral norm is a regularization technique and not a loss function.
But it’s also not the spectral norm itself that is supposed to help with the vanishing gradients and mode collapse problems, right? It’s Earth Mover’s distance rather. And one way to make sure that W-loss is valid is to encourage the model to satisfy L-1 continuity, right? That is where the spectral norm comes into play.
But what is the point of using spectral norm with BCE (as in the optional assignment notebook happens), given that there is no requirement we have to meet with this loss (meaning BCE)?

sohonjit.ghosh · August 31, 2021, 4:04pm

Hye @IvanK ,

See what happens normally is as the GANs proceeds they produce fake images. But, these fake image distributions are somewhat disjoint from the real image distribution. The 1-L Continuity acts as a connector. For WGAN, inducing the 1-L continuity makes the Earth Movers’ distance continuous and differentiable. But that doesn’t mean we can’t make use of the 1-L continuity with other cases of GANs, i.e. with BCE Loss. Spectral norm is like a bulkier brother of 1-L continuity which adds in that connectivity between the two disjointed distributions. Hence, used in conjunction with BCE Loss.

I would suggest you to read this amazing paper to get a better hold of the concept of 1-L Continuity in GANs. [1807.00751] Understanding the Effectiveness of Lipschitz-Continuity in Generative Adversarial Nets (arxiv.org). Don’t be scared by the number of pages. It has a very long appendix.

Best
Arijit

IvanK · August 31, 2021, 6:33pm

That is exactly what I was asking. Thanks
By the way, I just compared two models trained with BCE loss. One with spectral norm and the other without it.
It turned out that in my case, spectral norm helped the model to git rid of some low level artifacts inherent to transposed-convolution-based generators

sohonjit.ghosh · August 31, 2021, 11:50pm

Hye @IvanK ,

Its nice that you got the clear understanding.

Its pretty amazing that you compared and found the differences between the functions. Yes, you would see some changes for sure, but, as you go further in the specialization that the differences are minimal.

Best
Arijit

sohonjit.ghosh · September 1, 2021, 12:33pm

Hye @IvanK ,

I would request you to share the github link of your working if you can for others to see. You will get more exposure with this and other learners would also be encouraged to do more.

You are doing great and you are asking the best of questions. I am really glad that you have taken this with greater dedication. I was also elated that I was of some help to you.

Best
Arijit

IvanK · September 2, 2021, 8:17am

Hey @sohonjit.ghosh
Thanks for the kind words and your help

Here is the link to the repository with my experiments.

I would appreciate if you take a look and give some feedback for me to back-propagate over it and improve

sohonjit.ghosh · September 2, 2021, 1:06pm

Its very nice @IvanK . The code is perfectly fine. You are doing really well. Keep on posting!

Topic		Replies	Views
Ways to Implement Lipschitz continuity Build Basic Generative Adversarial Networks week-1	1	551	May 6, 2022
Question about loss in GAN Apply Generative Adversarial Networks week-3	1	262	April 9, 2024
How does bce loss induce vanishing gradients in GANs Build Basic Generative Adversarial Networks week-3 , ai-discussions , generative-ai	5	81	October 4, 2024
SN-GAN trained way to slowly Build Basic Generative Adversarial Networks week-3	6	252	July 4, 2023
Why no wasserstein loss in cGANS? Build Basic Generative Adversarial Networks week-1	1	526	January 21, 2022

SN-GAN optional assignment

Related topics