Diagram for "normalization" (input and batch) in a 2-layer ANN

Aannnd we have a diagram showing “normalization” (both input normalization and batch normalization) in a 2-layer ANN, at least for the forward propagation phase.

The backward propagation seems a bit more difficult :grimacing:

For input normalization, would one really compute the mean and variance over the whole example set?

It would probably be sufficient to select a “large enough” sample and compute the sample mean and sample variance.

Or maybe even normalize the input on a per-batch basis.