In third week of the course, there is a lecture for normalizing the activation layer of NN. I have a doubt why there is a minus mu when calculating the variance in sigma squared. To be precise, the formula for sigma squared is: “(1/m)* sigma (Z - mu)**2”. I expected the same variance as before without mu for the data.
To calculate the variance this is the formula in this photo
so you should sum of the square of subtract(minus) mu from every training set and divide it by n where n is number of trainig examples do we have …so in the lecture prof NG subtract evert training set from mean and after that he get the sum of square result and divide it by m which is number of training set like this photo
and after that apply a normalizing formula which is like that