Understanding error because of axis parameter in Normalization

I am trying to get better intuition for what is happening and how to use the axis parameter in the Normalization layer and I think I understood it but I am not sure why the programs giving errors for the above code.

The axis parameter is basically selecting the axis in which normalization happens, which means with axis = -1(for below data structure in code) it is normalizing the columns which in Machine Learning represents normalizing along all the samples of individual features/inputs.

From the error what I understood is that it is not liking the shape of the matrix it needs to normalize, right? It wants 2-D maybe but I dont get why it is not getting 2-D matrix or is it something else.
What I expected to happen was that it will normalize each row for example if first row is [2 4] then it will become [0 1]

Input = np.random.choice([1, 2, 3, 4, 5], p=[0.1, 0.35, 0.25, 0.1, 0.2], size=(20, 2))
Observed = np.random.choice([0, 1], size=(20, 1))

norm_In = tf.keras.layers.Normalization(axis=-1)
norm_In_ax = tf.keras.layers.Normalization(axis=0)

norm_In.adapt(Input) #learns mean and variance

norm_Input = norm_In(Input)
norm_Input_ax = norm_In_ax(Input)

ValueError                                Traceback (most recent call last)
Cell In[51], line 5
      2 norm_In_ax = tf.keras.layers.Normalization(axis=0)
      4 norm_In.adapt(Input) #learns mean and variance
----> 5 norm_In_ax.adapt(Input)
      7 norm_Input = norm_In(Input)
      8 norm_Input_ax = norm_In_ax(Input)

File c:\Users\tinnt\Documents\Learning\Machine Learning Specialization\Supervised Machine Learning Regression and Classification\.venv\Lib\site-packages\keras\src\layers\preprocessing\normalization.py:287, in Normalization.adapt(self, data, batch_size, steps)
    241 def adapt(self, data, batch_size=None, steps=None):
    242     """Computes the mean and variance of values in a dataset.
    244     Calling `adapt()` on a `Normalization` layer is an alternative to
    285           argument is not supported with array inputs.
    286     """
--> 287     super().adapt(data, batch_size=batch_size, steps=steps)

File c:\Users\tinnt\Documents\Learning\Machine Learning Specialization\Supervised Machine Learning Regression and Classification\.venv\Lib\site-packages\keras\src\engine\base_preprocessing_layer.py:258, in PreprocessingLayer.adapt(self, data, batch_size, steps)
    256 with data_handler.catch_stop_iteration():
    257     for _ in data_handler.steps():
--> 258         self._adapt_function(iterator)
    259         if data_handler.should_sync:
    260             context.async_wait()
    File "c:\Users\tinnt\Documents\Learning\Machine Learning Specialization\Supervised Machine Learning Regression and Classification\.venv\Lib\site-packages\keras\src\layers\preprocessing\normalization.py", line 188, in build
        raise ValueError(

    ValueError: All `axis` values to be kept must have known shape. Got axis: (0,), input shape: [None, 2], with unknown axis at index: 0

You posted in the “General Discussions” forum. So we do not know which course you are attending, or which assignment you’re asking about.

Please use the “pencil” tool in the thread title to move your post to the appropriate forum area.

I thought since I asked a doubt that I got because I was trying to understand code which was not in the scope of course I should put the question in general discussion.

Hello @tinted,

ValueError: All axis values to be kept must have known shape. Got axis: (0,), input shape: [None, 2], with unknown axis at index: 0

From the message it seems to me that tensorflow just doesn’t like you to normalize along the zeroth axis, or the batch axis. This may be because tensorflow expects a variable sample size for a batch. Putting yourself in tensorflow’s shoes, if you were asked to normalized along the batch axis given only 20 samples this time, but it is legal to receive 1000 samples next time, how should tensorflow behaves?

Furthermore, given axis = 1, each normalization constant sticks to a certain feature, right? What about the meaning with axis = 0? Each normalization constant sticks to what? We know we can order the same batch of samples in whatever way we want. There is no significant as to which comes first.

I suggest you experiment on other axes, perhaps with a dataset of 4 dimensions so that you can pick axis = 1, 2, or 3 to test with?


My only problem with experimenting with more dimensions is that it is hard to relate and hard to calculate/expect what will happen so I was trying with fewer dimensions first. Especially with 4 and more dimensions it gets really hard to guess the results.

let’s say the data is color images and the dimensions are [batch_size, height, width, color_channels] and axis = -1 | 3 then we will be normalizing along channels, which means it will take all the “Red” channels in the batch and normalize them, Which actually not that hard to visualize because images are easy to visualize and also because the data that is being normalized has same units, basically red intensity values in all three dimensions.

I think my confusion also stems from not knowing what the dimensions represent while experimenting and while trying to understand I lazily took arbitrary dimensions like [batch, position, velocity, acceleration] which is a big blunder. It should be more like [index , features ] or [time, features].

I cant think of any greater than 4 dimensions data matrix other than images right now. Final take away for me is that I should not expect to get the intuition for the axis without working with real data because it depends on the data structure as well.

Hello @tinted,

Let me share with you how I expect the outcome.

If the shape is (None, 3), then taking axis = 1 means there are going to be 3 sets of normalization factors. If the shape is (None, 2, 7, 9, 1, 4, 6), then taking axis = 5 means there will be 4 sets of factors because the shape of the 5th axis is 4 .

To see those factors, print norm_In.weights.

Or we can do this:


Cheers :wink:


1 Like