Batch Normalization with axis = -1 (3)

Hi Coursera community,

Could you guys please help me elaborate on the way to perform batch normalization over the last axis of the input (possibly color channels, filters). As we learned from the lectures regarding Batch Normalization, it goes along the first axis over each feature (# examples, # features). In the case of convolutional network, it is still very unclear for me why we make this setting.

I hope you guys could help me be enlightened more on this issue,

I have answered a similar post previously: