Week 2 - Clarity on DCGAN

I’m on week 2 C1_W2 assignment, which is building my first DCGAN.
so i need clarity on few things please.

  1. I need more light on why we had to use deconvolution(nn.ConvTranspose2d) for the generator and nn.Conv2d for the discriminator.

  2. I would like to know why 0.5 was passed into the normalization method. what effect would it have on the result if other values were used.

  3. the weight_init function created got me lost. why do we use it?
    those values came from where. I will so much be glad if anyone can break this down for me.
    image

This was covered in the lectures, wasn’t it? The fundamental point is that the generator needs to expand the dimensions of the data. In most cases, we start from a one dimensional “noise” vector of some size and we want the generator to produce a much larger object like an image. A transpose convolution is the “inverse” of a convolution: the generates a larger output if properly configured. So the usual technique is to cascade a number of transpose convolutions to create the output, frequently with other layers like activations included.

For the discriminator on the other hand, we are building a binary classifier: it takes a large object like an image and turns that into a single bit “fake/not fake” classification output. A normal convolution reduces the dimension of input to output if properly configured. We cascade a number of convnet layers with activation together and then train them to produce the desired “yes/no” output. We see convnets of that style used as classifiers all the time, e.g. in DLS Course 4.

In neural networks of any kind, we need to initialize the weights randomly in order to implement “symmetry breaking”. If you start with the weights the same, then when you run the training all the neurons will learn the same thing, which is not useful. That initialization routine is a pretty standard one that uses a normal distribution with \mu = 0 and \sigma a small value.

Well, you are welcome to try different values and see how it affects the results. Here’s the docpage for the torchvision.normalize function. I confess that the comment there does not map to my understanding the actual code that they gave you there. Having \mu = 0.5 and \sigma = 0.5 will not result in only values between -1 and 1 at least according to my understanding.

2 Likes

Thank You so much that was helpful.

I dug into the pytorch normalize function to understand how the code comment relates to the parameters passed in:

In the documentation, it says that the transform normalizes each channel using this function:

output[channel] = (input[channel] - mean[channel]) / std[channel]

So, the parameters are not actual mean and std values, but are adjustments to make to the input data to affect those values.

In particular, in the assignment, the input MNIST dataset has values ranging from 0.0 to 1.0, so:

  • a “mean” parameter of 0.5, (input[channel] - mean[channel]) will give us values ranging from -0.5 to 0.5, shifting the mean to the left by 0.5.
  • then, taking that value / std[channel] for “std” parameter 0.5, we stretch out the original std to get values ranging from -1.0 to 1.0

This is why the comments talk about normalizing the input values to fit the range -1.0 to 1.0

Thanks, Wendy! That makes total sense now. I was just thinking about “normal” distributions in general and hadn’t taken the step of going back and looking at the notebook to remind myself of what the inputs actually look like here.