Hello, @nadidixit,
I will make use of Paul’s explanation in my response to your question about the need of “axis” itself.
When we pass an input to the Normalization layer, we pass an array. Let’s consider our usual m \times n array referring to m samples of n features each.
To normalize, we first compute mean and standard deviation (S. D.). Question is, what values in the array do we calculate the mean (and S.D.) over? In the lecture, we know the normalization is across each feature. In other words, we have one mean and one S. D. value per feature, so if we look at the following array example, we want to track 3 pairs of mean and S.D., thus the blue ones in the bottom.
While the blue ones are possible, the red ones are also possible, and tensorflow does not design the Normalization function for our MLS courses, so it presents the axis
argument so that if we specify axis = 1
, it gives us the blue ones, and we have the red ones when axis = 0
.
While we may name the dimensions of such an array as row and column, tensorflow names them with numbers as 0
and 1
respectively. That’s why, for this Normalization layer, when we say axis=1
, we are asking the layer to normalize across each column. Likewise, when we say axis=0
, it is across each row. When we say axis=(0,1)
, it is across each element.
As Paul explained, axis=-1
means the last axis/dimension, so in the case above, axis=-1
is the alias for axis=1
, which means we are taking mean and S. D. across each feature, which is consistent with this course.
Cheers,
Raymond