this code to create instance of normaliztion in tensorflow. so what is the axis parameter -1. I found in tensorflow documentation says that means feature dimension. so what is dimension means. should be 1 to reference that we normalize the columns ?

HI, 10 days ago i was stuck in this same problem for 2 days, let me share what i found.

In NumPy and TensorFlow, the axis parameter is used to specify along which axis an operation should be performed. When you specify axis=-1, it indicates that the operation should be applied along the last axis of the array.

In a 2D array, the last axis (axis -1) corresponds to the columns i.e axis=1. This convention makes sense when considering the typical use cases for operations like normalization and summing:

For normalization in tf, itâ€™s common to normalize features (columns) in datasets. Each column represents a different feature, and itâ€™s typical to want to normalize each feature independently.(i.e vertically)

For summing np.sum, when you specify axis=-1, itâ€™s useful to sum across rows. This is often needed when, for example, you want to compute row-wise sums in matrices representing data points or observations. (i.e horizontally)

Therefore, the behavior of axis=-1 being interpreted as operating along the columns for normalization and along the rows for summing aligns with the typical use cases and conventions in data analysis and machine learning.

While this convention may seem counterintuitive at first glance, itâ€™s consistent with the way arrays are indexed and used in numerical computing libraries like NumPy and TensorFlow. Once you become familiar with this convention, it becomes easier to understand and work with array operations in these libraries. the cause may be because the tf and numpy were initially developed independently.

I am not convinced with this answer, I have just made peace with it, if you find a better explanation please let me know.

I agree with most of your comments, but for the sake of discussion, let me share my version:

Consider a 2D array x that has the row axis and the column axis.

For most array operations, we specify which axis to do away with. For np.sum(x, axis=-1), we do away with the last axis by summing them up, leaving only the row axis and thus calling it a row-wise operation.

Normalizations are exceptions - that we specify for which axis we want to create and keep the normalization constants. For `tf.keras.layers.Normalization(axis=-1)(x), we keep constants for the column axis and thus call it a column-wise operation.

When thinking about what kind of operation it is:

For sum, we do away with the columns and keep the rows. We call it a row-wise operation.

For normalization, we keep the constants of the columns. We call it a column-wise operation.

When thinking about what to specify for axis:

for most array operations, it is about what to do away with.

for normalizations (and batchnormalization), it is about what to keep.

The above comment applies to â€śmy conventionâ€ť too - for example, with a 10-D x, for np.sum(x, axis= (3,4,5) ), we will do away with those 3 axes and leave only axes 0, 1, 2, 6, 7, 8, 9 before they are re-numbered in order.

Nice explanation, @tarunsaxena1000, and it is always good to read more different views

Btw, @tarunsaxena1000, my â€śdo away withâ€ť description makes even more sense if you check out the numpy documentation for sum or other similar operations, because you will see a parameter called â€śkeep_dim=Falseâ€ť which carries the meaning that, if you donâ€™t keep dim, they are completely done away with, otherwise, you still get to keep the dim (but not the values because they have been aggregated).