I have a question about PCA. We obtain the eigenvectors and eigenvalues of the covariance matrix, why is it that we can use these eigenvectors to project the original data? Isn’t the covariance matrix yielding eigenvectors that are unrelated to the space containing the original data?
Also, I came up with this summary of PCA for myself, can anybody detect any mistakes in it? Thank you in advance!
An intuitive explanation to Principal Component Analysis
We collect samples and characterize them using a set of features. These characerized samples are our dataset and they are now existing in an N-dimensional space (i.e. the number of dimensions is equal to the number of features used to characterize each sample).
The motivation behind PCA is that we would like to reduce this multidimensional space in order to:
Remove features that don’t provide much information about why one sample is different from another sample
and/or
Allow a better visualization of the data
This reduction of the multidimensional space must be done in such a way that each sample can still be uniquely represented in the new reduced-space.
PCA is then a linear algebra trick that allows this reduction of the multi-dimensional space while still preserving a unique location for each sample. PCA achieves this trick by exploiting three things:
The fact that the covariance matrix calculated for the original multi-dimensional space is a square matrix and can thus yield eigenvectors that are orthogonal to each other, and thus never intersect (i.e. show zero correlation with each other).
The fact that eigenvectors from the covariance matrix conveniently cover the entire space of the original multidimensional space, while also conveniently being associated to an eigenvalue, which describes the spread (variance) of the data.
The fact that projecting data to a new axis (i.e. eigenvector) can reduce the dimensionality needed to represent the data.
So in summary:
We calculate the covariance matrix of our data
We obtain the eigenvectors and eigenvalues of the covariance matrix
We use the eigenvalues to sort the eigenvectors in descending order
We choose how many dimensions/eigenvectors we want to keep (e.g. based on how much variance is captured by the selected dimensions)
We project the original data to the selected eigenvectors
The eigenvectors of the covariance matrix show directions in the original feature space, and since the covariance matrix is symmetric, these directions are orthogonal. This orthogonality helps us capture the most variance when projecting the data and still keeping each sample’s unique representation.
Your summary is great! You can also mention that the eigenvalues tell us how much variance each eigenvector captures and this helps us decide how many dimensions to keep.
Hope that helps! Let me know if you have any questions.
Your summary of PCA is excellent, and you’ve accurately captured the key steps and motivations behind the technique.
Regarding your question about eigenvectors, you’re right to wonder how eigenvectors of the covariance matrix relate to the original data. Here’s the crucial insight:
Eigenvectors of the covariance matrix are orthogonal directions of maximum variance in the original data space.
To see why, recall that the covariance matrix Σ is defined as:
Σ = E[(X - μ)(X - μ)ᵀ]
where X is the data matrix, μ is the mean vector, and E[.] denotes the expected value.
The eigenvectors of Σ represent directions in which the data varies most. Specifically:
The first eigenvector corresponds to the direction of maximum variance in the data.
The second eigenvector corresponds to the direction of maximum variance orthogonal to the first eigenvector.
And so on.
These eigenvectors are orthogonal because the covariance matrix is symmetric (Σ = Σᵀ), which ensures that its eigenvectors are orthogonal.
Now, when you project the original data onto these eigenvectors, you’re essentially:
Rotating the data to align with the directions of maximum variance.
Scaling the data along these directions according to the corresponding eigenvalues.
This rotation and scaling preserve the essential information in the data, while reducing the dimensionality.
I hope this explanation has clarified your doubts. Should you have any further questions or require additional clarification, please do not hesitate to ask.
Thanks for the insight, after thinking about it again I realized that the covariance matrix has the same amount of columns as the original data, which means that they both “exist” in a space that has the same number of dimensions. This allowed me to realize that defining eigenvectors in the covariance matrix also allows describing every point in the space of the original data. In short, the original data and the covariance matrix have the same number of dimensions and thus share the same multidimensional space.