PCA/SVD squared matrix requirement

Hi all,

I cannot make full sense of squared matrix requirement for applying PCA. In the graded lab we had the example of applying it on a n x 8 dataset while SVD was applied in the image dataset 1 x 64.

The problem is that I cannot fully understand the argumentation provided for this in the lab - see below.

Could anyone extent on this so that I can understand the restriction?

Thanks in advance!

Lab explanation (most confusing section highlighted)

"You might think that since every digit is 8x8 this will be a square matrix and thus PCA might be a better choice. However, each digit is represented as a 1x64 array. Also you might wonder why the first example worked with PCA if the data had far more observations than features. The reason is that when performing PCA you end up using the matrix product XtX which is a square matrix. The above is a consequence of the nature of the problems. The pulsar star dataset had numerical data to represent each data point. On the other hand, this dataset represents images through pixels"

Hi @sergio.ortiz ,

Welcome to discourse and thanks for your excellent question!
I agree that the explanation in the colab is not very clear on why PCA is preferable for the pulsar dataset (n x 8) and truncated SVD is better on the image dataset (n x 64).
I believe the colab means that SVD tends to work better with sparse data matrices, but as I am not a real expert here, I’d like to get a second opinion: @chris.favila could you please help elaborate the reason as to why the colab Ungraded lab: Algorithmic Dimensionality Reduction claims that TruncatedSVD is better suited for the image dataset?

Just checked how PCA deals with the image dataset in colab and as you can see below, it yields similar results as the TruncatedSVD:

PCA and SVD are closely related anyhow; the following resources I found very helpful to further increase my intuition for the two methods

Hope that helps…
Regards,
Maarten

1 Like

Hi Maarten! Thanks for the heads up and the detailed answer to this question! Will report this to the devs for checking. Hope to update this soon. Thanks again!

Thanks for the information Maarten!

The part I found more difficult to understand is the statement on PCA and square matrixes. Considering most ML scenarios involve m samples of n dimensions, I cannot make sense that this is only a good choice for batches with the same number of samples and dimensions.
Surely I misunderstood the text as the example applies it in different scenarios with m samples.

Thanks again!

PCA does not require that the input matrix be square. The point is that PCA works by applying SVD (Singular Value Decomposition) to the Covariance Matrix of the input X. The Covariance Matrix is given by this formula:

\Sigma = \displaystyle \frac {1}{m} X^T \cdot X

X is not required to be a square matrix, but you can see that the Covariance Matrix is always a square matrix. Note that the product X^T \cdot X was also mentioned in the original quote that you gave.

Note that there are several ways to define this and it also depends on the orientation of your X “sample” matrix. The formulation I showed assumes that the orientation of X is m x n_x where m is the number of samples and n_x is the number of features in the input samples. The result is that \Sigma will be n_x x n_x.

If you are dealing with things the way Prof Ng defines them in DLS where the orientation of X is n_x x m, then you’d compute the covariance using X \cdot X^T so that you end up with the covariance having dimensions n_x x n_x.

There are also some formulations which handle the \frac {1}{m} differently, but in all cases you’re using the matrix product with the transpose to end up with a square matrix of dimension n_x x n_x. The point is that it is the feature dimensions that are being operated on by PCA.

Hi Paulin,

Did you actually read any of my messages? :slight_smile:

  • The point: I try to make sense of what is the criteria for selecting PCA or SVD.
  • The issue: unclear conclusions when drawing on different sources (course and links).

Initially, from reading the course I understood that there may be an issue applying PCA to non-square matrixes. After reading the links in Maarten’s kind response, I understood that you can apply any of the methods making the Covariance Matrix trick, but this may affect performance.

My guess is that there is no clear selection criteria and the approach is to test both methods and decide according to effectiveness/performance.
Confirmation or hints on the selection criteria are welcome :slight_smile:

Thanks @chris.favila !
Was curious to know if you got an explanation from the devs in the meantime on this topic.

BR
Maarten

Hi Maarten! Sorry for the late reply. I checked the lab and it looks like the markdown is already edited for clarity. The team is a bit busy this week but I can follow up if needed. Thanks!

Thanks Chris, no hurry at all. I’m on holidays with very poor connectivity for the next weeks so it can wait… as I was curious about the background just wanted to make sure it wasn’t forgotten.
Have a great day
Maarten

Ping +1 for an update. Thank you.

So, the wording used in the lab made it look like one method was better than the other for every problem or that it would only be suitable because of the nature of the data. This is NOT the case, you could apply both algorithms to both problems. The lab has now been updated to avoid this confusion.