NLP C1 W3 a mistake in PCA algorithm?

ruzakirov · November 6, 2021, 1:55pm

Video and slides show that normalization step has devision by standard deviation. However, during assignment the step is omitted. Also, wikipedia’s article talks only about subtracting mean: “we must first center the values of each variable in the dataset on 0 by subtracting the mean of the variable’s observed values from each of those values”.

As I understand it if we divide each feature vector by its std. deviation then we lower variance in data. Embeddings get closer to n-dim “sphere” rather than an ellipse. Not sure if it can result in picking different eigen vectors or just changes scale of the result.

reinoudbosch · November 12, 2021, 2:51am

Hi ruzakirov,

Whether or not you want to perform normalization depends on the meaning of the values and the variation in them. See this discussion.

ruzakirov · November 12, 2021, 9:37am

I’ve read a few topics on the subject. My problem is that lecture and programming assignment are not clear on the subject. Lectures don’t cover topic enough to complete assignment. I don’t like poking around figuring out what is meant by “de-meaning”. Especially when I have to figure out other things in the assignment that are also not very clear as it all is mostly new to me.

It’s second specialization I’m taking from you guys. So far I don’t like quality of the lectures in NLP spec. I’m not sure why exactly I feel this way. May be cuz videos feel too superficial. Quite simple themes are split into so many short videos. Text after each video instead of combined text after some subject or at the end of a week, so you can not watch one theme in one go without interruption.

May be it’s just me, don’t know.

reinoudbosch · November 13, 2021, 8:05pm

Hi ruzakirov,

Thank you for sharing your opinion. This type of feedback is very valuable, and will certainly help in improving the courses going forward!

Shantimohan_Elchuri · November 27, 2021, 11:47pm

Yes, you are right. If you use the division by std dev the final results are not the same as expected ones. I did try both outside coursera and found to get different results. However I just used the same formula that was used in the two labs.

Topic		Replies	Views
First step of PCA NLP with Classification and Vector Spaces week-module-3	2	339	March 25, 2023
Demean vs normalization vs Standardization for PCA NLP with Classification and Vector Spaces	1	424	November 12, 2021
W3, Assignment: why don't we devide X_demeaned by standard variance? NLP with Classification and Vector Spaces week-module-3	3	318	April 25, 2023
C2_W3_Lab_1_Regression_with_Perceptron Calculus for machien learning Calculus for Machine Learning and Data Science week-module-3	4	420	August 30, 2023
Normalization v.s. Standardize Neural Networks and Deep Learning coursera-platform	1	595	October 6, 2021

NLP C1 W3 a mistake in PCA algorithm?

Related topics