In the Week 4 lab, a weights_init(m) function was added to the code and applied to the generator and discriminator models. Why is weight initialization needed for the conditional GAN model, and not for the previous GAN models we studided?
Good observation! It turns out that it’s the same here in GANs as it is in DNNs or CNNs: the initialization algorithm you use is a “hyperparameter”, meaning a choice you need to make. There is no one magic “silver bullet” choice that works the best in all cases. You’ll notice that the init code they gave us uses a normal (Gaussian) distribution with \mu = 0 and \sigma = 0.02. It turns out that PyTorch has a default weight initialization algorithm for each type of layer. For most of the types that I looked at it (e.g. Conv2D) it uses a variant of the Uniform Distribution with \mu = 0 and a range based on the number of neurons in the particular layer. It’s not super clear in the documentation, but here’s the docpage for Conv2D and you can find it buried in all that verbiage.
They must have run the experiment and decided that convergence is better in this case with the normal distribution. You can try removing that code or rewriting it to use a Uniform Distribution and see how it affects the performance of the training in this example. Science!
Thank you, @paulinpaloalto! For reference, I confirmed Paul’s comment about Conv2D default initialization being uniform distribution in this discussion What is the default initialization of a conv2d layer and linear layer? - #2 by richard - PyTorch Forums.