Question about behind the scenes of the Generative Adverserial Network

From week 1’s assignment. I know we have an input channel of 64 initially. But i still fail to understand how did the generator class made a 1x28x28 image based on the passed input channel which vector of noise is [128x64] . Can you walk me through how the [128x64] input became [128x1x28x28].

Hi Joshua!

Welcome to the community :wave:

I will walk you through for 1 noise or 1 sample (out of 128) [This also involves some portion of the code that you should do :

Initially, the shape of the vector is (64,1)
From the code [ constructor of Generator class [UNQ_C2] ] ::

        get_generator_block(z_dim, hidden_dim), - > [You have to implement get generator block -> basically this converts the input tensor shape from (z_dim,1) to (hidden_dim,1)]

→ You can understand this by the next function call (where the input shape is given as hidden_dim → accordingly you have to implement get_generator_block (follow the optional hints in the notebook)). Here hidden dim is initialized as 128, so the shape is converted from ( input_dim,1) to (128,1).

        get_generator_block(hidden_dim, hidden_dim * 2), -> [ initial shape (128,1)  - output shape (256,1) ]
        get_generator_block(hidden_dim * 2, hidden_dim * 4), -> [ initial shape (256,1)  - output shape (512,1) ]
        get_generator_block(hidden_dim * 4, hidden_dim * 8), ->  [ initial shape (512,1)  - output shape (1024,1) ]

Now, you have to implement some layers such that this vector of shape (1024,1) gets converted to a vector of shape (im_dim,1) and here im_dim is initialized as 784 → This makes the output shape as (784,1) [ This is what you needed ].

Now in the training loop, a real image of size (28,28,1) is flattened to get a vector of size (784,1).

Hope you get the point. If not, feel free to post your queries.

Regards,
Nithin

Hello Nithin,

Thank you for responding. I just realized. that the final layer does convert 1024 to 784 for week 1’s which is 28*28. What I meant to ask was about for week2’s final layer : When using deconvolution function, where output channel = 1, input channel =64, with a kernel size= 4, stride = 2, how does this lead to a 28 * 28 image? ( Maybe i should transfer this topic at week 2)