Requesting clarification on Image Generation

Joshua_Siy · September 1, 2023, 9:29am

Hello

About week2’s programming assignment. How does the final layer ,using deconvolution function, where output channel = 1, input channel =64, with a kernel size= 4, stride = 2, convert the input to a 28*28 image [128,1,28,28]?

Dmitriy_Khvan · September 3, 2023, 10:26pm

Hi, @Joshua_Siy!
I assume that the essence of your question is “how did a latent noise tensor of size [w,h] result in [28,28] output?” (please correct me if this isn’t what you’re asking).
Firstly, in the screenshot above I’ve printed Generator’s layer summary. The output size of each of the layers is determined by the formula [(W−K+2P)/S]+1. Please use this link to experiment with different settings of input tensor size and conv layer parameters.
Secondly, I’ve printed tensor sizes after every conv layer in the generator. If you try to compute tensor sizes using the formula above you should arrive at the final tensor size - [n_samples, 1, 28, 28].
Please let me know if my reply wasn’t helpful or you have any follow up questions.

Kind regards,
DK.

Joshua_Siy · September 6, 2023, 5:44am

Thank you for your response. Maybe I do have another question about the first layer. How did input noise vector [n_samples,z_dim] become → output [100,256,3,3]. I mean using the simulator provided, I’m not sure what’s the input so I guess I’ll just paste my result for the first layer using the simulator. Please advise, as I am unsure of what I’m doing.

Joshua_Siy · September 6, 2023, 5:54am

I think I’m getting a hint that. the noise vector in completion is [100,64,1,1] . because if I upsample a 1x1 image using a kernel of 3 and a stride of 2, it will result to 3x3 correct?

Joshua_Siy · September 6, 2023, 6:53am

I just found the answer after looking at the documentation for conv2dtranspose.

Just writing what I understood.
The formula for the output height and width are , inshort excluding the sub equations that lead to zero will be:
Output size = (Input size - 1) * stride - 2 * padding + kernel size

given the input of batch size and channel, (100,64)

We deconvolve this with a layer with kernel of 3 and stride of 2. However I’m not sure what’s the image size of this so I just assumed it is 1x1 .

1x1 with a kernel of 3 and stride of 2 .results to 3x3.

3x3 with a kernel of 4 and stride of 1. results to 6x6,

6x6 with a kernel of 3 and stride of 2 results to a 13x13

13x13 with a kernel of 4 and a stride of 2 results to 28x28.

Please correct me if I’m wrong. Thanks.

Dmitriy_Khvan · September 6, 2023, 12:46pm

the initial dimension of the noise vector is [n_samples,z_dim] → [100, 10]. In forward() method of Generator class this vector gets unsqueezed to [100, 10, 1, 1] this 4 dimensional array can be treated as [batch_size, n_channels, w, h]. So you’re correct in your assumption that the “image” dimension of the noise is 1x1. I only struggle to understand where is 64 coming from.
(64 is the output channel dimension of the second generator block - it takes in 10 channels produces 64).

Dmitriy_Khvan · September 6, 2023, 12:48pm

The rest is correct

Topic		Replies	Views
Question about behind the scenes of the Generative Adverserial Network Build Basic Generative Adversarial Networks week-module-1	2	383	September 1, 2023
Working of DCGAN Build Basic Generative Adversarial Networks week-module-2	3	356	January 12, 2024
Understanding sizes in GANs Generative Adversarial Networks (GANS)	14	327	September 8, 2021
W2 lab: ConvTranspose2D vs ConvTranspose1D Build Basic Generative Adversarial Networks week-module-2 , week-module-3	2	544	January 12, 2023
C1_W2_Assignment: How is the image dimension even provided? Build Basic Generative Adversarial Networks week-module-2 , week-module-3	1	423	June 25, 2023

Requesting clarification on Image Generation

Related topics