C2_W1_Lab02_CoffeeRoasting_normalization

Hey Hey,

I have a question on C2_W1 optional lab 2, where a neural network is built using TensorFlow. I specifically had a question on the code for normalizing the input data:

norm_l = tf.keras.layers.Normalization(axis=-1)

I am not sure I understand what the “axis” argument and the chosen value -1 are doing in this function. Could someone please get back to me on this? I tried looking it up, but I don’t think I got it.

Thanks a lot!
Nadi

When used as in index value in python, -1 just means “the last element”. So that means the last axis, whatever that is. They sometimes use that instead of explicitly setting the axis because they want to be flexible about whether the model is handling a batch of inputs or just a single input value that is missing the “samples” dimension.

1 Like

Hello, @nadidixit,

I will make use of Paul’s explanation in my response to your question about the need of “axis” itself.

When we pass an input to the Normalization layer, we pass an array. Let’s consider our usual m \times n array referring to m samples of n features each.

To normalize, we first compute mean and standard deviation (S. D.). Question is, what values in the array do we calculate the mean (and S.D.) over? In the lecture, we know the normalization is across each feature. In other words, we have one mean and one S. D. value per feature, so if we look at the following array example, we want to track 3 pairs of mean and S.D., thus the blue ones in the bottom.

While the blue ones are possible, the red ones are also possible, and tensorflow does not design the Normalization function for our MLS courses, so it presents the axis argument so that if we specify axis = 1, it gives us the blue ones, and we have the red ones when axis = 0.

While we may name the dimensions of such an array as row and column, tensorflow names them with numbers as 0 and 1 respectively. That’s why, for this Normalization layer, when we say axis=1, we are asking the layer to normalize across each column. Likewise, when we say axis=0, it is across each row. When we say axis=(0,1), it is across each element.

As Paul explained, axis=-1 means the last axis/dimension, so in the case above, axis=-1 is the alias for axis=1, which means we are taking mean and S. D. across each feature, which is consistent with this course.

Cheers,
Raymond

2 Likes

Hi Raymond,

Thanks a lot for the great explanation! It’s very clear to me now :)))

Cheers
Nadi

1 Like

You are welcome, Nadi!