Hi Sir,
@paulinpaloalto @bahadir @eruzanski @Carina @neurogeek @lucapug @javier @kampamocha
This is my understanding about batch norm. can u please help to tell my understand good about it or not ?
At each iteration, due to the parameter updates, earlier layer hidden unit values get changes, since due to changes, amount of distribution of the hidden unit values get changed, so later layer becomes hard to adapt/learn its previous layer input features. Thus learning is not easy and convergence becomes slower.
Suppose If we apply batch norm, amount of distribution of hidden units wont much changed in the earlier layers, this makes later layer easy to adapt / learn the distributions of previous layer hidden units, convergence becomes faster.
Am i right sir ?