Batch Normalization Gradients

hello, I’ve implanted a model with 5 hidden layers and 80 nodes in each layer. I want to implant the batch normalization but the problem is I don’t know how to get the gradients of Betha and gamma (a=gamma*z+Betha) . I’ve read some articles about the chain rule but they were very confusing. so could anyone explain how to get the gradients of Betha and gamma in backpropagation step by step please?
I’m using MATLAB and my activation function is Tanh

Thank you very much

Hello Sadegh,

It’s great that you are trying to implement the NN’s with B.N. from scratch but as Prof. Andrew Ng. says in previous course 1, taking gradients of cost functions at these very high dimensional spaces with these extremely complex functions is one of the most complicated matters of DL.
You need to have proficient knowledge about Calculus, Linear Algebra and Matrix Calculus in order to be able to start finding out what’s happening when we take gradients of cost func. with respect to other parameters.
Actually the method we used in course 1 for taking gradients of W and b for different layers is not used in common AI frameworks like Tensorflow and PyTorch, they use some other method named ‘Automatic Differentiation’, understanding ‘Automatic Differentiation’ requires some pretty deep knowledge of Math in the fields that I mentioned to you.

I know that you were expecting someone answer you by some equations proving the formulas of gradients taken from cost func. with respect to gamma and beta parameters but that doesn’t help you in the real world because it has nothing in common with the method that Tensorflow or similar frameworks are doing in order to compute those gradients.