I think the mean and variance can't be scalar value: W3 : Batch Normalization :

bromstrong1 · September 12, 2022, 8:29am

Hi guys , I have encountered a question while watching the Normalizing Activations in a Network video , at 6:27 . Since Dr.Andrew Ng is talking about the matrix of intermediate values Z at layer l , where as l is arbitrary , and 1<l<L , we must assume that the number of nodes at layer l is greater than 1 , let’s say n[l] = 3 , for example . That makes the matrix Z[l] have the shape of ( 3,m ) ( m is the number of samples ) . According to the first upper two equations , the value of mean and variance should be a scalar , but if we actually plug in the matrix Z[l] ( which have the shape of (3,m) ) into those 2 equations , the results of mean and variance will be vectors ! .
The only situation where mean and variance is a scalar , according to those two equations , is when the layer l only have 1 single node . But that will take away the generalness of these equations , since they should hold true for any number of node in layer l .
Can you please point out where I was wrong ? Thank you very much !

reinoudbosch · October 10, 2022, 4:46pm

Hi bromstrong1,

Thanks for your question about this issue that confused me as well. You are right that with multiple nodes mu will be a vector. This becomes clear in the lecture ‘Fitting Batch Norm into a Neural Network’, at 9:58.

Topic		Replies	Views
Calculation of the Mean (mu) in Batch Norm Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	524	October 10, 2022
Doubts on the video titled Normalizing Activations in a Neural Network Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	541	June 29, 2021
Batch Normalization with Stochastic Gradient Descent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	560	February 27, 2022
The formula ambiguity about variance calculation for activation normalization layer Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	524	December 25, 2022
Question about batch norm Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	636	April 26, 2023

I think the mean and variance can't be scalar value: W3 : Batch Normalization :

Related topics