Why do we normalize the whole dataset, vs individual features?

AnshG714 · October 16, 2023, 4:40am

The assignment does normalization using the following code:

X_multi_norm = (X_multi - np.mean(X_multi))/np.std(X_multi)
Y_multi_norm = (Y_multi - np.mean(Y_multi))/np.std(Y_multi)

Why are we normalizing the values with respect to the mean of the whole dataset, as opposed to values in individual columns? For instance, GrLivArea and OverallQual have different units of measurement, so shouldn’t we normalize these rows ‘individually’? Or is np.mean/np.std already doing this for us?

TMosh · October 16, 2023, 4:58am

Yes. The means are computed for each feature.
For example, np.mean(X_multi) returns a vector which holds the mean for each feature.

TMosh · October 16, 2023, 6:15pm

To illustrate:

np.mean(X_multi) gives a vector of two values - one for each feature.

AnshG714 · October 17, 2023, 4:00am

Awesome! makes sense. Is this something specific to operations on Pandas Dataframes? According to numpy.mean — NumPy v1.26 Manual, if an axis is not specified it will attempt to take the mean of the flattened array.

TMosh · October 17, 2023, 4:34am

The course uses Numpy version 1.20.1. That’s what the Coursera Labs platform provides.

I’m not sure exactly how it’s documented, it may have to do with the default behavior of the keepdims parameter. Don’t really know.

An experiment is often more useful than the documentation.

Topic		Replies	Views
C2_W3_Lab_1_Regression_with_Perceptron Calculus for machien learning Calculus for Machine Learning and Data Science week-module-3	4	416	August 30, 2023
Normalization(axis=-1) Advanced Learning Algorithms week-module-1	4	538	January 19, 2023
Mean Normalization (Feature scaling part 2) Supervised ML: Regression and Classification week-module-2	8	386	October 2, 2023
Normalization v.s. Standardize Neural Networks and Deep Learning coursera-platform	1	589	October 6, 2021
Calculus Week 3, Lab 1: Issues Running on Local Machine Calculus for Machine Learning and Data Science week-module-3	1	51	June 26, 2024

Why do we normalize the whole dataset, vs individual features?

Related topics