Batch norm in the critic in WGAN-GP assignment

I was reading the paper that proposed gradient penalty (GP) for the WGAN architecture and i found this paragraph:

No critic batch normalization
Most prior GAN implementations [22, 23, 2] use batch normalization in both the generator and the discriminator to help stabilize training, but batch normalization changes the form of the discriminator’s problem from mapping a single input to a single output to mapping from an entire batch of inputs to a batch of outputs [23]. Our penalized training objective is no longer valid in this setting, since we penalize the norm of the critic’s gradient with respect to each input independently, and not the entire batch. To resolve this, we simply omit batch normalization in the critic in our models, finding that they perform well without it. Our method works with normalization schemes which don’t introduce correlations between examples. In particular, we recommend layer normalization [3] as a drop-in replacement for batch normalization.”

But in the assignment for this week the critic there is a batch norm for every layer except the final layer:

if not final_layer:
            return nn.Sequential(
                nn.Conv2d(input_channels, output_channels, kernel_size, stride),
                nn.BatchNorm2d(output_channels),
                nn.LeakyReLU(0.2, inplace=True)

I wonder what is the reason for this batch norm? I’m trying to build a WGAN-GP for generating a fictional character, and this information would be of enormous value for me.

Hi Gustavo!
Welcome to the community :wave. I hope that you are doing well.

You can see these lines in the assignment : (Below the Heading (“Generator and Critic”)).

You will begin by importing some useful packages, defining visualization functions, building the generator, and building the critic. Since the changes for WGAN-GP are done to the loss function during training, you can simply reuse your previous GAN code for the generator and critic class. Remember that in WGAN-GP, you no longer use a discriminator that classifies fake and real as 0 and 1 but rather a critic that scores images with real numbers.

So as you can see, the assignment just focuses on the main objective, to track the changes in the loss function (which is the major change in W-GAN). It is not intended to replicate the paper fully.

However, on your end, you can try to replicate the paper (follow the things mentioned in the paper). But you will come across better GANs in the upcoming courses of the specialization so don’t just finalize W-GANs for your project, check out everything.

Regards,
Nithin

Thank you for your reply @Nithin_Skantha_M !

I thought that maybe some researchers could have found that Batch Norm was actually useful in WGAN-GP, because not only in the assignment but in some codes i’ve found in github were using Batch Norm, but your answer clarified that.

I actually finished the specialization and there are more robust and powerful models of GANS, but I couldn’t think of a way for applying them, that’s why my plan was to stick with WGAN-GP.

To be more clear I’m doing my final project for my specialization in Data Science & Analytics and I’m trying to generate a brazilian folklorik character, and i’ve looked for many datasets and have never found any images of this character, so i gathered images across the internet and i’m trying to generate more images of this character and hopefully attract more attention to the brazilian folkore.
I don’t know if I misunderstood, but the other architectures i’ve learned in the specialization like StyleGAN, CycleGAN and Pix2Pix was focused on transfering styles over one image to another. I have a very small dataset with 827 until now(it’s hard to find more images of this character), what in your opinion would be a good alternative? Thank you very much for your attention.

Hi Gustavo!
That’s cool. I wasn’t clear on your problem statement. Yes these architectures that are taught in the course are more for transferring, but there are a few other GAN variants too in research, but for your problem statement, WGAN can be a good fit as it focuses more on improving the stability and convergence of GAN training. Pls try and proceed with WGAN itself.

1 Like

I’m glad that I didn’t misunderstood the concept of those GANs architecture and i’m making efforts in the right direction. Thank you very much for your reply!

1 Like