In the piece of code above, X (training set) has a shape (200,2) because it has 200 training examples and 2 features. In the Coffee Roasting Lab, the tf.keras.layers.Normalization function uses axis=-1 which kind of does not make sense. I believe it should be axis=0 since we want to calculate the mean and variance of all the training examples for each feature. Can someone please explain how axis=-1 works out?
I tried working with axis=0 but I am getting an error
All `axis` values to be kept must have known shape. Got axis: (0,), input shape: [None, 2], with unknown axis at index: 0
I always suggest us to read the documentation when something is unclear Let me quote:
The axis or axes that should have a separate mean and variance for each index in the shape. For example, if shape is (None, 5) and axis=1 , the layer will track 5 separate mean and variance values for the last axis… Defaults to -1, where the last axis of the input is assumed to be a feature dimension and is normalized per index.
Tensorflow reserves the 0th axis for carrying the meaning of number of samples, and in a way that it doesn’t really limit how many there are. Though your training set has 200 samples, Tensorflow allows the actual inputting number of samples be anything, making it flexible for predicting any number of test cases, or flexible for any mini-batch training sizes. The None there means any number in this context.
As for the 2, it is the shape of a single sample, which is the only thing - (2,) - you need to tell Tensowflow for the input_shape of the model.