Course 2 Week 3, Question on Batch Normalization addressing Covariate Shift

Hi, in the introduction of Batch Normalization the professor mentions how a classifier trained on recognizing cats would not do very well if it was trained on black cats and had to test on colored cats. I believe Batch Normalization can address this by normalizing the inputs within their data set.

However, in the video on how to determine the final mean/variance to scale test data by, the professor mentions using an exponentially weighted average of the mean/variance of the mini batches. This makes sense if the test data was coming from the same distribution. I was wondering, however, if the data was like the cat example (training=black cats, test=colored cats), if this method of obtaining the final mean/variance would still work? My intuition, if I were to apply this model to the real world, would be to collect colored cat data to obtain a new mean/variance, and that should allow the model to be translated over much better. Is that something that is done in practice?

Hey @daHsu,
Welcome to the community. I guess Prof Andrew clearly explains why Batch Normalization would work in the case of Covariate Shift, in the video entitled “Why does Batch Norm work?”.

Here, when I say “batch normalization”, I also include the concept of using exponentially weighted average to calculate \gamma and \beta. I would suggest you to review the video once again. I hope this helps.