Batch Normalization

bgoyal · May 31, 2021, 7:12am

I’m a bit confused by the notation in this slide. I was of the opinion that the normalization would be carried out separately for each training example (so m means, m variances). But the summation variable i seems to be over the training examples. Do we calculate the mean and variance of all training examples together and use those values?

nramon · May 31, 2021, 6:43pm

Hi, @bgoyal.

You are trying to normalize each feature independently. From a single example you get a single observation for each feature. So yes, you use whole mini-batches to compute their means and variances during training (and maintain moving averages that are used during inference).

Here are all the details in case you’re interested

AbhijeetKrishnan · May 31, 2021, 6:58pm

Is there a difference between using $\sqrt{\sigma^2 + \epsilon}$ vs. $\sqrt{\sigma^2} + \epsilon$ similar to the Adam optimizer?

nramon · May 31, 2021, 10:29pm

Hi, @AbhijeetKrishnan.

Both avoid divisions by zero, but I don’t know if one of the approaches works best in this particular context. Consistency is definitely important.

Topic		Replies	Views
Confusion with Input normalization and batch normalization Improving Deep Neural Networks: Hyperparameter tun	3	608	January 22, 2022
Batch Normalization Or Batch Standardization Improving Deep Neural Networks: Hyperparameter tun	1	574	July 12, 2021
C2W3 quiz - understand answer to Question 8 Improving Deep Neural Networks: Hyperparameter tun	8	649	June 1, 2022
Batch Normalization with Stochastic Gradient Descent Improving Deep Neural Networks: Hyperparameter tun	1	550	February 27, 2022
Questions on batch normalization Improving Deep Neural Networks: Hyperparameter tun	3	366	September 27, 2023

Batch Normalization

Related topics