ckim
1
when estimating the variance sigma, batch normalization uses
the mean mu=1/m*sum(x_i)
the variance sigma^2=1/m*sum(x_i-mu)
but the formula for estimating the variance is
sigma^2=1/(m-1)*sum(x_i-mu)
as the variance is biased. Why does batch normalization divide by m and not by m-1 here?
2 Likes
Great question @ckim!
From the batchnorm paper, the unbiased variance estimate is actually used during inference:

For completeness, the batchnorm algorithm is summarized by
and

However, as you have just pointed out, it has been debated before if it is better to use unbiased variance estimates during training as well:
6 Likes