Why does axis behavior is changed in the TensorFlow reduce_mean and Normalization layer?

I am trying to implement the normalization using numpy and tensorflow normalization (first seen this layer in C2_W1_Lab03_CoffeeRoasting_Numpy assignment)

x = tf.random.uniform((32, 5))
n = tf.keras.layers.Normalization(axis=0)
n.mean.shape # this gives TensorShape([32, 1])

tf.reduce_mean(x, axis=0).shape # this gives TensorShape([5])

Hello @tbhaxor,

Great work trying it out yourself, and listing the shapes! tf.keras.layers.Normalization is implemented to work that way, and as the documentation says:

Integer, tuple of integers, or None. The axis or axes that should have a separate mean and variance for each index in the shape. For example, if shape is (None, 5) and axis=1 , the layer will track 5 separate mean and variance values for the last axis.

Setting axis=0 in Normalization lets us normalize the zeroth axis, and to do so, we need the mean value by averaging over the first axis.

Setting axis=0 in reduce_mean lets us compute the means by reducing (averaging over) the zeroth axis.

They sound reasonable, but as for why the decision is so made, we will really need to ask the developers…