Why discriminator class gives desired output!

Discriminator class has only Conv2d layers which impact the dimensions of the output. (batchnorm or activation don’t change the dimensions.) As per my understanding Conv2d results in shrinking of height and width of the filters and output dimension tells how many such filters to generate.
How does this discriminator knows to generate no height or width of the last layer where only 1 filter need to be generated. Unless last layers kernel size and input filter size are same such that convolution operation will generate only 1 value. I am just really confused. Unless kernel at each subsequent layer has been selected very carefully to provide final output as desired. If yes then this makes this model really rigid and one can’t play with model architecture.

You can have a Conv2D layer with 1 output channel and a 1 x 1 height and width. The way the Discriminator “knows” to specify the layer sizes is that you tell it. Check the __init__ function of the Discriminator class to see how it specifies all the layers including the final layer. You can also add some code to print the shape of the output of the Discriminator. What do you see?

The other thing to say is that it’s only rigid in what you need the output of the final layer to be. You have complete freedom to decide how many hidden layers there are and what shapes they output.