Calculation of the Mean (mu) in Batch Norm

Steven_Zayas · September 5, 2022, 7:21pm

I am confused as to the calculation of mu (the mean) in the Batch Normalization for hidden layers in a Neural Network.
First, for the purposes of the question, let’s assume our minibatch size is 64, and the number of activation nodes in our given layer l is 5.
So, in a vectorized implementation, Z[l] is of shape (5x64). We thus need to run this through our batch normalization before activation.
For the calculation of mu, over what dimension in Z[l] are we taking the average? Meaning, are we averaging across the 5 activations (the rows), or are we averaging across the 64 examples (the columns)? What would the resulting shape of mu be?

Thank you, to anyone who helps!

reinoudbosch · October 10, 2022, 4:42pm

Hi Steven_Zayas,

Thanks for your great question about this issue which confused me too.

Reading this post made it clear that averaging is in fact done over the number of samples - which makes sense if you think about it, as you want to standardize the level of activation per node. The resulting shape of mu will be (5, 1).

Topic		Replies	Views
Doubts on the video titled Normalizing Activations in a Neural Network Improving Deep Neural Networks: Hyperparameter tun	1	522	June 29, 2021
Batch Normalization Improving Deep Neural Networks: Hyperparameter tun	3	551	May 31, 2021
An ambiguity about batch normalization at test time Improving Deep Neural Networks: Hyperparameter tun	1	522	December 13, 2022
Week 3 normalization 2 questions Improving Deep Neural Networks: Hyperparameter tun	3	556	January 9, 2022
Batch normalization gradient computation question Improving Deep Neural Networks: Hyperparameter tun	3	533	August 17, 2022

Calculation of the Mean (mu) in Batch Norm

Related topics