Batch Normalization propagation

Sadegh_Rahmati · September 11, 2021, 6:22pm

hello. I’m trying to implant the batch normalization to my neural network model but I have a problem in understanding the chain rule to get the gradients of gamma and betha. lets say in the output layer we have x; xhat=(x-mean(x))/std(x) ; z=z*gamma+betha; a=f(z) (f is activation function)
in the chain rule first we have to compute dloss/dz. first question, is dloss/dz = (df/dz)f (z) *(a-y)/m?
second question, in the final step we get dloss/dx, so if we move to the pervious layer to apply the chain rule again, is this computed dloss/dx equal to dloss/dz in the pervious layer?

Thank you very much

balaji.ambresh · April 22, 2022, 7:52pm

Please see this paper

Topic		Replies	Views
Batch Normalization Gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	566	February 28, 2022
Batch normalization gradient computation question Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	568	August 17, 2022
Batch Norm Backprop Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	614	May 4, 2022
Batch Norm Gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	547	June 30, 2021
Week 3 - How will batch normalization affect backpropagation? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	1144	April 29, 2022

Batch Normalization propagation

Related topics