NLP C1 W3 a mistake in PCA algorithm?

Video and slides show that normalization step has devision by standard deviation. However, during assignment the step is omitted. Also, wikipedia’s article talks only about subtracting mean: “we must first center the values of each variable in the dataset on 0 by subtracting the mean of the variable’s observed values from each of those values”.

As I understand it if we divide each feature vector by its std. deviation then we lower variance in data. Embeddings get closer to n-dim “sphere” rather than an ellipse. Not sure if it can result in picking different eigen vectors or just changes scale of the result.

Hi ruzakirov,

Whether or not you want to perform normalization depends on the meaning of the values and the variation in them. See this discussion.

I’ve read a few topics on the subject. My problem is that lecture and programming assignment are not clear on the subject. Lectures don’t cover topic enough to complete assignment. I don’t like poking around figuring out what is meant by “de-meaning”. Especially when I have to figure out other things in the assignment that are also not very clear as it all is mostly new to me.

It’s second specialization I’m taking from you guys. So far I don’t like quality of the lectures in NLP spec. I’m not sure why exactly I feel this way. May be cuz videos feel too superficial. Quite simple themes are split into so many short videos. Text after each video instead of combined text after some subject or at the end of a week, so you can not watch one theme in one go without interruption.

May be it’s just me, don’t know.

Hi ruzakirov,

Thank you for sharing your opinion. This type of feedback is very valuable, and will certainly help in improving the courses going forward!

Yes, you are right. If you use the division by std dev the final results are not the same as expected ones. I did try both outside coursera and found to get different results. However I just used the same formula that was used in the two labs.