DLS Course 4, 2nd Week, First assignment

I want to know why in the assignment do we ‘BatchNorm is normalizing the channels axis’. Because in the documentation it is stated that we usually normalize over the features axis.

The second question is, I understand the concept of normalization, but what is it like to normalize only on the channels axis compared to normalize over all axis or just say the m axis (training examples axis).

Thank you!

Hi Mohamed Gallai,

Welcome to the community.

Have a look at this thread that gives an idea over batch normalization and its concept on why it is being used.

1 Like