Normalization v.s. Standardize

Yuchen_Zhang · October 5, 2021, 10:52pm

In the two assignments in week 2, we performed normalization on each row of the X matrix (i.e. normalizing each feature), and standardized each column of the X matrix (i.e. standardizing each training example).

Why should we normalize the rows and standardize the columns?

Why shouldn’t we normalize the columns and standardize the rows?

Thanks.

kenb · October 6, 2021, 2:22pm

Hi, @Yuchen_Zhang . I am not sure what you mean. In the (graded) assignment, Exercise 2, you are presented with a filled-in cell in which you “standardize” the data:

trains_set_x = train_set_x_flatten / 255.
test_set_x = test_set_x_flatten / 255.

The division by 255 is an element-by-element operation. Every value in the ..._flatten matrices is divided by 255 (pixel intensities are indexed from 0 to 255) to keep the values between 0 and 1 (including endpoints). Each column represents an image (with 12288 pixel values) . The number of columns therefore represents the number of example images. The key point here is that every element of the ...flatten matrix represents a pixel intensity.

The point of confusion may be as follows. Typically, a standardization operation involves a bit more than that, because the goal is to have each feature (a row in the feature matrix) to be measured in comparable units. After all, the features are not generally pixel values of an image, which do have comparable units. For example, in house-price prediction, the features may be quantities such as square footage, number of rooms, number of baths, acreage, etc.

As an example, if X is an n_x \times m feature matrix, we might want to do this by subtracting the mean of each feature and then divide by its standard deviation. To do this we need the row-rise means and standard deviations: X_mean = np.mean(X, axis=1) and X_stdev = np.std(X, axis=1). The standardized feature matrix then becomes

(X-X_{mean})/X_{stdev} .

Note that a small computational miracle happens here. As note earlier, X is an n_x \times m matrix, but X_{mean} and X_{stdev} are n_x-dimensional vectors. In the numerator “broadcasting” operation automatically subtracts the X_{mean} vector from each column in the X matrix, and a similar broadcasting operation handles the division.

I hope this helps!

Topic		Replies	Views
W2_A1_Normalizing the matrix to scale down the features Neural Networks and Deep Learning coursera-platform	1	431	July 8, 2023
In Lesson 1, Jupyter Notebook Cell 3, why are we flattening each image into one single vector? Vector Databases: from Embeddings to Applications	4	183	November 15, 2023
Standarization and centering the dataset AI Discussions	1	131	October 2, 2022
Ex. 6, normalize_rows question Neural Networks and Deep Learning coursera-platform	1	589	June 27, 2021
Standardization & Sigmoid function Neural Networks and Deep Learning coursera-platform	2	592	April 21, 2022

Normalization v.s. Standardize

Related topics