Different Mean/Standard Dev values for hidden units

Muhammad_Bin_Usman · July 16, 2023, 7:46am

In batch normalization, Andrew said we may not want all hidden units to have a mean-0 and S.D =1 , If i guess correctly, it’s due to " breaK Symmetry"

Muhammad_Bin_Usman · July 16, 2023, 7:53am

And if we really wanted larger variance to take advantage of sigmoid’s non-linearity, then why did we even normalize it

Christian_Simonis · July 16, 2023, 9:28am

Hi @Muhammad_Bin_Usman,

welcome to the community and thanks for your question!

Batch normalization can help to accelerate the training, by aligning batches, so that the training is done more consistently, which should be achieved by tackling the problem of the internal covariance shift (leading to a systematic change in network activations), which is also well outlined in this article: Internal Covariate Shift: How Batch Normalization can speed up Neural Network Training | by Jamie Dowat | Analytics Vidhya | Medium and this paper: [1502.03167] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Can you elaborate a bit more what you mean here specifically? In general: batch normalisation is about ensuring consistency in layer activation so that basically several layers fit well together especially since the weights in the layers change during training and we have different batches of training data. By this we want to make sure the gradient flow works efficiently and gradients are stable (e.g. risk of vanishing gradients is reduced), see also this thread: Vanishing/Exploding Gradients when there is a non-linear activation function - #3 by Christian_Simonis!

Hope that helps.

Best regards
Christian

Topic		Replies	Views
Batch Normalization Intuition Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	588	November 22, 2022
Batch Normalization Intuition questions Improving Deep Neural Networks: Hyperparameter tun week-module-3 , coursera-platform	8	69	July 19, 2024
Week 3: Why Batch Norm Works Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	634	October 26, 2021
Why batch norm works? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	525	November 17, 2021
Question about batch norm Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	614	April 26, 2023

Different Mean/Standard Dev values for hidden units

Related topics