Is there a way to visualize the impact of batch normalization on a neural network?
A sort of simulation.
I understand from the course that it is about normalizing the input features using the mean and variance so that all input features to the subsequent layers are at the same scale as to minimize “drifting/shifting”.
It depends on what you are looking for. Maybe this blog post can get you started, or this one. A more complex approach can be found here.
I hope this gives you some idea.