C1_W2_Assignment: How is the image dimension even provided?


As you can see from the notebook, the images generated are of fixed dimension. However, I have read through the whole notebook and I can’t figure it how how the dimension information is provided in either the generator or the discriminator at all

The only image information we have is the number of channels which is 1 for grayscale images. So how can the networks figure out the dimension of the images from just the number of channels?

Thanks a lot

The operations that we’re doing here are convolutions and transposed convolutions. The way those work, you don’t need to know the h and w size a priori. Consider the forward convolution case: you just need to know the number of channels of the input, because that determines the shape of the filters. So suppose the inputs are RGB images with 3 color channels and you’re using a “kernel size” or filter size of 3, then you can handle any size of images in terms of the vertical and horizontal dimensions, right? Of course that will determine the output shapes, but it will work without knowing a priori the h and w dimensions. The formula for one of the dimensions on a forward convolution is:

n_{out} = \displaystyle \lfloor \frac {n_{in} + 2p - f}{s} \rfloor + 1

So suppose n_{in} is 64, f = 3, s = 1 and p = 0, then we get:

n_{out} = \displaystyle \lfloor \frac {64 - 3}{1} \rfloor + 1 = 62

And if n_{in} is 256, we get:

n_{out} = \displaystyle \lfloor \frac {256 - 3}{1} \rfloor + 1 = 254