Batch Normilization

Ibrahim_Mustafa · April 15, 2023, 9:06pm

batch normilization try to reduce the proplem of covariance shift but can this lead to bias i mean the model has trained on black cat and if we test on white cats with batch normilization the distribution of white cat will be diffrent from the distribution of black cat (as i understand) but batch normilzation will reduce these diffrence in hidden layers so if i give the network a white or black dog the
the batch normilization will try to reduce the diffrence in hidden layers so it will lead to wrong prediction

Elemento · April 16, 2023, 7:04am

Hey @Ibrahim_Mustafa,
The aim of batch-normalization is to reduce the shift between the distributions of samples across the batches, and not to reduce the shift in the distributions of samples belonging to different classes. And that is why, in batch normalization, we have trainable parameters, using which we can make sure that the samples across different batches have almost the same distributions, but the samples belonging to different classes will still have differing distributions, although the new distributions might be different from the original ones, since, we are now using the new statistics, i.e., the mean and the variance. Let us know if this resolves your error.

Cheers,
Elemento

Ibrahim_Mustafa · April 16, 2023, 5:29pm

Iam sorry but i didn’t understand could you please explain with example

Elemento · April 17, 2023, 5:28am

Hey @Ibrahim_Mustafa,
Sure, let me use an example, in which we will be considering samples belonging to 2 mini-batches and 3 different classes. Note that both the mini-batches have samples of all the 3 different classes (though the number of samples of each class in each mini-batch might be different).

Let’s say that the statistics in the form of (mean, variance) for 1st mini-batch is (2.01, 1.98) and the for the second mini-batch is (3.01, 0.91). The difference in statistics could be due to various reasons, for instance, the sources of majority of samples in the 2 mini-batches might be different. Now, with the help of Batch Normalization, we might be able to bring these distributions closer to each other, for instance (0.51, 1.02) and (0.53, 1.05). In the absence of trainable parameters, both the distributions would be of the form (0, 1).

Now, the thing to note here is that in this entire example, we haven’t talked about the distributions of the different classes, because we don’t need to. When we will scale the distributions of the entire mini-batches, then the distributions of individual classes will be scaled automatically. Earlier the distribution of say samples belonging to classes “A” and “B” could be (1.52, 1.21) and (2.61, 2.93), and now after batch-normalization, the distributions could be (0.37, 0.32) and (0.89, 1.01).

As we can see that the distributions belonging to different classes are still different from each other, though they have been scaled in accordance. I hope this examples resolves your query.

Cheers,
Elemento

Topic		Replies	Views
Batch Normalization Intuition Improving Deep Neural Networks: Hyperparameter tun	1	572	November 22, 2022
Batch Norm and Covarient shift Improving Deep Neural Networks: Hyperparameter tun week-3	4	20	September 14, 2024
Course 2 Week 3, Question on Batch Normalization addressing Covariate Shift Improving Deep Neural Networks: Hyperparameter tun	1	520	June 17, 2022
Batch Normalization Intuition questions Improving Deep Neural Networks: Hyperparameter tun week-3	8	47	July 19, 2024
Week 3: Why Batch Norm Works Improving Deep Neural Networks: Hyperparameter tun	6	592	October 26, 2021

Batch Normilization

Related topics