In batch normalization why the parameter b is cancel out ?
In this post Batch Normalization : B parameter is similar with this small example: {a+k,b+k} and {a,b} have the same normalization .
But in the “Fitting Batch Norm into a Neural Network” at 8:41 why Andrew treat
b as a constant ? In a mini batch we have at a layer L Z= W*A + b, where b is a matrix right ? If we write the first row of Z we get:
[W[0]*a1+b11,W[0]*a2+b12,W[0]*a3+b13…] where W[0] is the first row of W , b’s are scalars of the first row of b and a’s are the output of the previous layer …So the idea is at component wise level it is not a constant which is added .