First step of PCA

mc04xkf · March 24, 2023, 4:13pm

The first step of PCA is to normaling the data.
In the programming assignment
It does not require to divide the standard deviation after substracting the mean:

X_demeaned = (X - X.mean(axis=0))

While I thought it should have been
X_demeaned = (X - X.mean(axis=0))/X.std(axis=0)

Why is that ?

Elemento · March 24, 2023, 5:09pm

Hey @mc04xkf,
If we take a close look at the markdown, we will find the following:

Mean normalize the data

In other words, we only have to transform the data so that the mean can be 0. We don’t have to set the variance to 1 in this case. Let us know if this helps.

Cheers,
Elemento

arvyzukai · March 25, 2023, 6:59am

Hi @mc04xkf

I would try to complement the Elemento’s answer and try to answer “why”.

This is true when features have different units (for example “height”, “salary”, etc.). But in our case the word embedding features in essence have the same units.

If we added the column of “salary” to X, then we would definitely have had to standardize the matrix because the variance of “salary” would have been much much higher than in other columns (and the basis for PCA is covariance).

Have you tried it? In cases like these I would encourage to try it yourself . Here is a fun counter question:
Which of these (in the picture) are:

de-meaned = X - X.mean(axis=0)),
standardized = (X - X.mean(axis=0)) / X.std(axis=0),
not changed =X.

Note the magnitude of x and y axes. Try to reason your guesses before you try yourself

Topic		Replies	Views
W3, Assignment: why don't we devide X_demeaned by standard variance? NLP with Classification and Vector Spaces week-module-3	3	311	April 25, 2023
Demean vs normalization vs Standardization for PCA NLP with Classification and Vector Spaces	1	422	November 12, 2021
NLP C1 W3 a mistake in PCA algorithm? NLP with Classification and Vector Spaces week-module-3	4	489	November 27, 2021
C1_W3_Assignment, regarding compute_pca() NLP with Classification and Vector Spaces course-related , week-module-3	9	336	April 26, 2024
Which normalisation are we referring to here? Unsupervised Learning, Recommenders, Reinforcement week-module-2	2	27	October 12, 2024

First step of PCA

Related topics