C3_W2 - PCA Question

Mujassim_Jamal · February 11, 2023, 7:55am

In the video PCA Algorithm of PCA section, Prof. Andrew talked about arrow vector of length 1 pointing to the direction of z axis.

So my question is how this arrow vector is obtained? and why we are taking dot product to project it on z-axis?

bs80 · February 11, 2023, 2:01pm

The idea of PCA in a nutshell is to reduce the number of dimensions required to describe a point by approximating it as a projection onto a lower-dimensional space.

In this case, the point is located at (2, 3), which requires two dimensions (x1-axis and x2-axis) to describe, and we are trying to approximate it by projecting it onto a one-dimensional space (a line) described by the unit vector z. The direction of the vector z, I believe, was chosen arbitrarily to illustrate an example.

Taking the dot product of the point [2, 3] and the unit vector [0.71, 0.71] gives you the 3.55 which is the distance from the origin on the line described by the vector z. Now, instead of requiring two values (2, 3) to describe the point on the x1,x2 plane, we can use just one value (3.55) to describe the point on the line indicated by z.

Incidentally, the unit vector [0.71, 0.71] is actually [1/sqrt(2), 1/sqrt(2)] rounded to two decimal places, so the dot product is actually 2.5×sqrt(2) which is more like 3.5355. And if you multiply z by this dot product, you would get [2.5, 2.5], which is the actual coordinates of the projection point that is used to approximate the original point [2, 3].

Hope this helps.

Christian_Simonis · February 11, 2023, 8:33pm

It’s actually obtained by solving an eigenvalue problem. Here more info can be found on the steps involved as well as some background info on eigenvectors etc.

Usually we want to go for dimensionality reduction of the feature space to get a better ratio of data to dimensions (which can often help to mitigate overfitting when dealing with limited amount of data). We can get rid of redundant information in our features by doing this transformation to a smaller space which is spanned by the principal components (a subset of the eigenvectors of our previously mentioned problem), see also this thread: Does embedding projector use dimensional reduction? - #4 by Christian_Simonis

Here some exemplary code which you can use to play around with and e.g. check how much information is explained by which of the principal components:

(As you see, the last PCs do not provide too much value here information-wise…)

Best regards
Christian

Arios_Tong · February 21, 2023, 5:02am

not so sure but I feel you could think of z as span of this vector [0.71 0.71] and use the projection formula xu/uu u here to get the projection point, since you could consider [0.71 0.71] as a basis of z. And because norm of [0.71 0.71] is 1, so actually you can simplify the projection formula to be x*u u, and the projection point is 3.55 [0.71 0.71]. since the [0.71 0.71] has length of 1, so the distance is literally 3.55.

Topic		Replies	Views
What is lenght 1 vector in PCA? Unsupervised Learning, Recommenders, Reinforcement week-2	8	568	October 7, 2024
NLP C1 W3 "Lab: Another explanation about PCA" misprint in "PCA as a strategy for dimensionality reduction"? NLP with Classification and Vector Spaces week-3	3	340	June 13, 2022
PCA question Linear Algebra for Machine Learning and Data Sc... week-4	3	35	October 23, 2024
Question on Dot product to project data NLP with Classification and Vector Spaces week-3	1	350	December 24, 2021
Questions on how the projection works Linear Algebra for Machine Learning and Data Sc... week-4	2	145	May 31, 2024

C3_W2 - PCA Question

Related topics