Hi, in the introduction of Batch Normalization the professor mentions how a classifier trained on recognizing cats would not do very well if it was trained on black cats and had to test on colored cats. I believe Batch Normalization can address this by normalizing the inputs within their data set.
However, in the video on how to determine the final mean/variance to scale test data by, the professor mentions using an exponentially weighted average of the mean/variance of the mini batches. This makes sense if the test data was coming from the same distribution. I was wondering, however, if the data was like the cat example (training=black cats, test=colored cats), if this method of obtaining the final mean/variance would still work? My intuition, if I were to apply this model to the real world, would be to collect colored cat data to obtain a new mean/variance, and that should allow the model to be translated over much better. Is that something that is done in practice?