Batch Normalization Gradients

Sadegh_Rahmati · September 10, 2021, 6:14am

hello, I’ve implanted a model with 5 hidden layers and 80 nodes in each layer. I want to implant the batch normalization but the problem is I don’t know how to get the gradients of Betha and gamma (a=gamma*z+Betha) . I’ve read some articles about the chain rule but they were very confusing. so could anyone explain how to get the gradients of Betha and gamma in backpropagation step by step please?
I’m using MATLAB and my activation function is Tanh

Thank you very much

Mahdi_Fatemi · February 28, 2022, 6:51pm

Hello Sadegh,

It’s great that you are trying to implement the NN’s with B.N. from scratch but as Prof. Andrew Ng. says in previous course 1, taking gradients of cost functions at these very high dimensional spaces with these extremely complex functions is one of the most complicated matters of DL.
You need to have proficient knowledge about Calculus, Linear Algebra and Matrix Calculus in order to be able to start finding out what’s happening when we take gradients of cost func. with respect to other parameters.
Actually the method we used in course 1 for taking gradients of W and b for different layers is not used in common AI frameworks like Tensorflow and PyTorch, they use some other method named ‘Automatic Differentiation’, understanding ‘Automatic Differentiation’ requires some pretty deep knowledge of Math in the fields that I mentioned to you.

I know that you were expecting someone answer you by some equations proving the formulas of gradients taken from cost func. with respect to gamma and beta parameters but that doesn’t help you in the real world because it has nothing in common with the method that Tensorflow or similar frameworks are doing in order to compute those gradients.

Topic		Replies	Views
Batch Normalization propagation Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	523	April 22, 2022
Batch Norm Gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	545	June 30, 2021
Batch Norm Backprop Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	600	May 4, 2022
Batch normalization gradient computation question Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	559	August 17, 2022
Gradiant Desent over normalization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	367	October 5, 2023

Batch Normalization Gradients

Related topics