C2W3 Batch Normalization: Why is bias term redundant?

In C2W3’s video on implementing batch normalization in neural networks, Prof Andrew Ng mentioned that the bias term is unnecessary since it will be cancelled out during mean subtraction, and it will instead be replaced by the Beta learned parameter. But doesn’t the bias term contribute towards the mean of Z^[l] and removing the bias term would underestimate/overestimate the true mean of Z^[l]?

Perhaps I still can’t grasp Prof Ng’s explanation on why the bias term is rendered unnecessary in that video. Would anyone be able to elaborate more in detail as to why this is the case?
Any help would be appreciated, thanks in advance.

underestimate/overestimate the true mean of Z does not affect z_{norm}^{(i)}

Hi, can anyone explain this more? I have the same question.

Hello @ansonchantf,

Let’s say this neuron has this equation: 3x + 7, and let’s say in this batch we have 10 samples [ 6, 7, 8, 9, 0, 1, 2, 3, 4, 5 ]. What will the outcome be after batch normalization?

If you change 3x + 7 into 3x + 700, will the outcome change because of the change in the bias term?

Cheers,
Raymond

1 Like

Thank you @rmwkwok
I just tried to input the numbers and the result shows no difference for the change of bias term. Appreciate that!

1 Like

You are welcome, @ansonchantf!

Cheers!