In week2 programming assignment generator block is to be created. In this block there are two types in the block, final and non-final. In the case if non-final block batch norm was used but in the case of final block batch norm was not used.
Hi Sairam!
Hope you are having a great day. Block here refers to a set of layers, if not the “final layer” (that is final layer parameter passed to the function is “false”) then the call will return a block with 3 layers(convTranspose–>BatchNorm–>ReLu) and if the final layer parameter is set to be “true” then we are calling the function to get the final layers of our generator.
Coming to your question, batch normalization is done in general to speed up computations and to reduce the internal covariate shift—>change in the distribution of the inputs to a layer during training. So in general batch norm is done. Coming to the final layer, there is no restriction that we should not call batch norm before tanh, we can do that way too, it simply depends on our way of implementation, here it is implemented this way (one way of implementation), there can be alternate ways too, you can try plugging in batch norm before tanh and compare the performance. They have just given us the recommended way. So ultimately it’s our decision how to implement it, but from the assignment point of view it’s better to just do what’s recommended, you can play with it in a private notebook maybe.
@sairampoosarlahyd and @Nithin_Skantha_M,
One more point on this - the assignment tries to follow the architecture defined for the official DCGAN, and in the DCGAN paper it explains that by experimentation, they found that it is best to leave out the batchnorm for the final layer. Specifically, the paper says:
Directly applying batchnorm to all layers however, resulted in sample oscillation and model instability. This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer.
So, we are benefitting from their experimentation.