C3_W2_Principal Component Analysis

Hello, in Week 2 Principal Component Analysis, PCA Algorithm lecture, it has been told that 2 preprocessing should be done before applying the algorithm:

  1. Make your data to have zero mean,
  2. Make your features to have similar range.

The second, I do understand as PCA works (kind of) by maximizing variance, which requires calculation of Euclidean Distance. But the first one, I can’t really understand why we need to force the data to have zero mean. Does it make any difference?

Thanks!

Hi @jaejun02
One of the reason for having zero mean in PCA is, PCA is using Co-variance matrix of the data. If the data is not centered (i.e., if the means are not zero), the covariance calculations will be skewed by the mean values, leading to incorrect principal components.

Thank you so much for the reply. I try exploring a bit more and come back if I have further questions!!

1 Like