I read the post by AshutoshSahu and also the answers given for that but still i am confused. In that post i followed the answer given by raymond but didn’t get the last part.

ary = np.array([
    [ [1, 2, 3,], [2, 3, 4,], ],
    [ [5, 4, 3,], [3, 2, 1,], ],

suppose we have a numpy array of shape (2,2,3). if we put axis=-1 then is it taking mean and variance of [1,2,3],[2,3,4],[5,4,3],[3,2,1] separately and then calculating the final values?

This array has 3 axis: 0, 1, and 2. So axis=-1 would be doing the operation over the axis 2. If we do a sum over the axis 2 (or axis = -1) then we would convert the resulting array into a 2-dim array with values equal to the operation over the values of 2.

For instance:

ary.sum(axis=-1) = ary.sum(axis=2) =

[[ 6 9]
[12 6]]

See how we arrived to a shape=(2,2) from a shape=(2,2,3) because we ‘consolidated’ everything on axis=-1 (the axis with index 2 starting from 0).

Same principle would apply to other operations like mean.

You can do some tests by doing things like:


Running these tests may shed light to your question.

Hope it helps,


tell me if am right.
the normalization process is basically calculating the mean and variance for each column because each column represent a features and that is why we have given axis = -1 because we want the mean variance values for each feature.

Normalization is scaling the input variables so that they have similar ranges of values. We don’t want, for instance, some variables with values under 100 and some other values with values over 100,000. With normalization we, well, normalize these inputs to prevent one variable from dominating the others.

One way to achieve normalization is by subtracting the mean of each variable and dividing by the standard deviation.

1 Like

Hello @Utsav_Sharma1,

You need to tell us which function you are setting axis=-1 to, and what dataset you are talking about.

  1. Function: I suppose it is tf.keras.layers.Normalization.
  2. Dataset: I suppose the dataset has a shape of (m, n), where m is number of samples, and n is number of features.

Given the above two, your description is correct. You can also find a very similar description by reading the documentation of tf.keras.layers.Normalization, in which it says:

For example, if shape is (None, 5) and axis=1 , the layer will track 5 separate mean and variance values for the last axis.