Hi guys , I have encountered a question while watching the Normalizing Activations in a Network video , at 6:27 . Since Dr.Andrew Ng is talking about the matrix of intermediate values Z at layer l , where as l is arbitrary , and 1<l<L , we must assume that the number of nodes at layer l is greater than 1 , let’s say n[l] = 3 , for example . That makes the matrix Z[l] have the shape of ( 3,m ) ( m is the number of samples ) . According to the first upper two equations , the value of mean and variance should be a scalar , but if we actually plug in the matrix Z[l] ( which have the shape of (3,m) ) into those 2 equations , the results of mean and variance will be vectors ! .

The only situation where mean and variance is a scalar , according to those two equations , is when the layer l only have 1 single node . But that will take away the generalness of these equations , since they should hold true for any number of node in layer l .

Can you please point out where I was wrong ? Thank you very much !

Hi bromstrong1,

Thanks for your question about this issue that confused me as well. You are right that with multiple nodes mu will be a vector. This becomes clear in the lecture ‘Fitting Batch Norm into a Neural Network’, at 9:58.

1 Like