Size of image created by generator

In the code of generator, the width and height info. is not mentioned at all, how is the size of image determined ?

    def __init__(self, z_dim=10, im_chan=1, hidden_dim=64):
        super(Generator, self).__init__()
        self.z_dim = z_dim
        # Build the neural network
        self.gen = nn.Sequential(
            self.make_gen_block(z_dim, hidden_dim * 4),
            self.make_gen_block(hidden_dim * 4, hidden_dim * 2, kernel_size=4, stride=1),
            self.make_gen_block(hidden_dim * 2, hidden_dim),
            self.make_gen_block(hidden_dim, im_chan, kernel_size=4, final_layer=True),
        )

How about looking in the making_gen_block function, there should the image size or if not there in the other helper functions.

There is no image size in make_gen_block

    def make_gen_block(self, input_channels, output_channels, kernel_size=3, stride=2, final_layer=False):
        '''
        Function to return a sequence of operations corresponding to a generator block of DCGAN, 
        corresponding to a transposed convolution, a batchnorm (except for in the last layer), and an activation.
        Parameters:
            input_channels: how many channels the input feature representation has
            output_channels: how many channels the output feature representation should have
            kernel_size: the size of each convolutional filter, equivalent to (kernel_size, kernel_size)
            stride: the stride of the convolution
            final_layer: a boolean, true if it is the final layer and false otherwise 
                      (affects activation and batchnorm)
        '''

        #     Steps:
        #       1) Do a transposed convolution using the given parameters.
        #       2) Do a batchnorm, except for the last layer.
        #       3) Follow each batchnorm with a ReLU activation.
        #       4) If its the final layer, use a Tanh activation after the deconvolution.

        # Build the neural block
        if not final_layer:
            return nn.Sequential(
                #### START CODE HERE ####
                nn.ConvTranspose2d(input_channels, output_channels, kernel_size, stride),
                nn.BatchNorm2d(output_channels),
                nn.ReLU()
                #### END CODE HERE ####
            )
        else: # Final Layer
            return nn.Sequential(
                #### START CODE HERE ####
                nn.ConvTranspose2d(input_channels, output_channels, kernel_size, stride),
                nn.Tanh()
                #### END CODE HERE ####
            )

The helper function show_tensor_images does have a size parameter, but it is not used at all

def show_tensor_images(image_tensor, num_images=25, size=(1, 28, 28)):
    '''
    Function for visualizing images: Given a tensor of images, number of images, and
    size per image, plots and prints the images in an uniform grid.
    '''
    image_tensor = (image_tensor + 1) / 2
    image_unflat = image_tensor.detach().cpu()
    image_grid = make_grid(image_unflat[:num_images], nrow=5)
    plt.imshow(image_grid.permute(1, 2, 0).squeeze())
    plt.show()

Hi mc04xkf!
Hope you are doing well. I understand that you got confused here. It is good that you have pointed this out as it will help you to have a more clear idea of what’s happening in this code. I will try to explain it in a clear manner (step by step), it is easy to understand if you have some idea about classes and objects, but if you don’t have an idea about it then do refer to some tutorials on concepts like classes, inheritance in Python before going through this post as that will help you to understand this easily.
First, go to the last but one cell in the notebook.
You will see this :

As you can see here, first they have created an object “gen” of the class Generator here and they have initialized the z_dim too by passing the z_dim parameter which will invoke the constructor and initialize self.z_dim in the generator class. Now if you see the last cell , there will be this portion :

Here as you can see, fake noise is passed to the object gen as a parameter like a function call. When you call gen(fake_noise), it invokes the __call__ method of the Generator instance gen with the input fake_noise. This is equivalent to calling gen.forward(fake_noise). The call method is not defined explicitly in the Generator class. But as you can see the Generator class is not a base class rather it is an inherited class ( inherited from base class torch.nn.Module). When the object gen was created it would have invoked the super constructor too ( initializes everything in the base class too ) and the bass class methods can also be called by this object. The call method is defined in the base class.

So now it is clear that the procedure that we have seen above is equivalent to calling gen.forward(fake_noise). So now let’s look at the forward function.

Once the forward function is called (input to the function is “noise”), it first calls the unsqueeze_noise function and assigns x with the value that is returned. This x is then used as the input to the sequential neural network which is then returned finally. Now if you look at the unsqueeze function, you will finally get the answer to your question :star_struck:

Just read through the function, and you will finally get to know where they have given the image size (they have assigned height=width=1 here → Batch of images(noise tensors) of shape (batch_size,number of channels,height,width) == (len(noise),self.z_dim,1,1) ).

Ok, sorry for this long story but hopefully this clears your doubt.
Regards,
Nithin

Thanks Nithin, I figured it out with your help.

The key is in the output shape of ConvTranspose2d that does upsampling

The output shape formula is
H_{out} = (H_{in} - 1)*stride - 2*padding +dilation*(kernel\_size-1)+output\_padding+1
W_{out} = (W_{in} - 1)*stride - 2*padding +dilation*(kernel\_size-1)+output\_padding+1

In our case, dilation is 1, and padding, output\_padding are 0, so it is simply

H_{out} = (H_{in} - 1)*stride +kernel\_size
W_{out} = (W_{in} - 1)*stride +kernel\_size

The initial input shape is (z\_dim, 1, 1), which is (64,1,1)
After the first upsampling, the size is (256, 3, 3)
After the second upsampling, the size is (128,6,6)
After the thrid upsampling, the size is (64, 13,13)
After the final upsampling, the size is (1,28,28), which is exactly the shape of the data of MNIST

So, the kernel size, stride, etc are in fact chosen very carefully at each hidden layer in order to get the desired output shape.