Week 2 assignment batch norm

sairampoosarlahyd · January 27, 2023, 7:48am

In week2 programming assignment generator block is to be created. In this block there are two types in the block, final and non-final. In the case if non-final block batch norm was used but in the case of final block batch norm was not used.

Is there any reason why batch norm was not used?

Thank you.

Nithin_Skantha_M · January 27, 2023, 10:24am

Hi Sairam!
Hope you are having a great day. Block here refers to a set of layers, if not the “final layer” (that is final layer parameter passed to the function is “false”) then the call will return a block with 3 layers(convTranspose–>BatchNorm–>ReLu) and if the final layer parameter is set to be “true” then we are calling the function to get the final layers of our generator.

Coming to your question, batch normalization is done in general to speed up computations and to reduce the internal covariate shift—>change in the distribution of the inputs to a layer during training. So in general batch norm is done. Coming to the final layer, there is no restriction that we should not call batch norm before tanh, we can do that way too, it simply depends on our way of implementation, here it is implemented this way (one way of implementation), there can be alternate ways too, you can try plugging in batch norm before tanh and compare the performance. They have just given us the recommended way. So ultimately it’s our decision how to implement it, but from the assignment point of view it’s better to just do what’s recommended, you can play with it in a private notebook maybe.

Have a good day.
Regards,
Nithin

sairampoosarlahyd · January 27, 2023, 11:43pm

Thank you, Nithin

Wendy · January 28, 2023, 1:28am

@sairampoosarlahyd and @Nithin_Skantha_M,
One more point on this - the assignment tries to follow the architecture defined for the official DCGAN, and in the DCGAN paper it explains that by experimentation, they found that it is best to leave out the batchnorm for the final layer. Specifically, the paper says:

Directly applying batchnorm to all layers however, resulted in sample oscillation and model instability. This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer.

So, we are benefitting from their experimentation.

Topic		Replies	Views
Week 3: Why Batch Norm Works Improving Deep Neural Networks: Hyperparameter tun	6	592	October 26, 2021
Week 3 Assignment 2 section 2.2.1 - Block training for Batch Norm layers Sequence Models	3	525	May 26, 2022
GAN Course 1 - Week1 assignment: Why batchnormalization is not used in discriminator block? Build Basic Generative Adversarial Networks week-1	1	628	November 1, 2021
Basic question in Build basic generative Adversial network C1W1 and C2W2 assignments Build Basic Generative Adversarial Networks week-1 , week-2	6	423	January 23, 2024
Batch norm in the critic in WGAN-GP assignment Build Basic Generative Adversarial Networks week-3	4	280	June 5, 2023

Week 2 assignment batch norm

Related topics