In C2W3’s video on implementing batch normalization in neural networks, Prof Andrew Ng mentioned that the bias term is unnecessary since it will be cancelled out during mean subtraction, and it will instead be replaced by the Beta learned parameter. But doesn’t the bias term contribute towards the mean of Z^[l] and removing the bias term would underestimate/overestimate the true mean of Z^[l]?
Perhaps I still can’t grasp Prof Ng’s explanation on why the bias term is rendered unnecessary in that video. Would anyone be able to elaborate more in detail as to why this is the case?
Any help would be appreciated, thanks in advance.
Let’s say this neuron has this equation: 3x + 7, and let’s say in this batch we have 10 samples [ 6, 7, 8, 9, 0, 1, 2, 3, 4, 5 ]. What will the outcome be after batch normalization?
If you change 3x + 7 into 3x + 700, will the outcome change because of the change in the bias term?