What is lenght 1 vector in PCA?

A bit not understood how to get vector 0.71 and what is it at all. Also didn’t understand what is the difference with orange 3.55 vector. It is also length, that helps us to get z axis and initial data, isn’t it?

1 Like

Hi @someone555777 ,

If you referred back to the video, from timestamp 8:26 onwards, prof Ng did say: "supposed PCA found the Z axis , and the length 1 vector is the vector pointing to the direction of the Z axis, so the values of length 1 vector are what being found. The formula to project X with [2,3] coordinates onto the Z axis is by using the dot product calculation, giving a value of 3.55. So what this represents is that X now has a new, single representation on the Z axis, which is 3.55. The 3.55 is the length on the Z axis from the origin.

Still not clear for me, where these 0.71 (0.707) numbers comes from?

1 Like

so, is length 1 vector something like multiplier to get projection (orange vector)? Should it contain the same number in itself? And how do we compute this length 1 at all if it not contains the similar numbers?

I just came across the same problem, and I did some research myself.

Length 1 vector probably means unit vector, because unit vector is a vector with length (magnitude) 1. The slide doesn’t show the calculation of the 0.71 numbers. But I found this article explained the calculation of unit vector with examples. Hope it helps!

1 Like

Thanks, Cyrus @cyruscsc! And let me also add one point - we need it to be unit vector if the goal is to calculate the projection.

1 Like

Can someone clarify this a little more? I reviewed this lesson on unit vectors at Khan academy Khan Academy, but I’m still a bit fuzzy where [0.71, 0.71] came from. The unit vector has magnitude 1; wouldn’t the unit vector for [2, 3] (with magnitude sqrt(13)) be (2/sqrt(13), 3/sqrt(13))?

Edit: NVM, I see the magnitude/hypotenuse length is ~1 for (0.71, 0.71)

1 Like

I believe that normalising each principal component to the unit vector ensures that each principal component is of equal magnitude. Doing so means that we just need to pay attention to the direction of each principal component’s axis to determine the variance in that axis, that is there’s no additional scaling. Choosing unit vectors allows us to compute the principal components by just calculating the Eigen Vectors of the covariance matrix of the higher dimensional data.

Where did 0.71, 0.71 come from? Prof Ng stated that the PC lay on that axis. It could have been something different, but stating that the PC axis bisected x1 and x2 it made the example easier to following along with.

Imagine a right angled triangle with sides 1/square_root(2) and 1/square_root(2). 1/square_root(2) ~ 0.71. Using Pythagoras’ theorem, this means the hypothenuse = square_root(0.5 + 0.5) = 1. It satisfies the unit vector requirement. If we wanted to calculate it, we’d just solve the equation 1 = a^2 + a^2 → a = 1/square_root(2). Hope this helps.

One other thing that occurred to me, that may help someone else that was stuck on this: the point (2,3) is not on the Z axis; it is drawn closely in the image referenced above, but it’s not actually there, which led to some confusion. There’s a slide later in the video that makes this visually more clear. He could have picked any arbitrary point like (10,1) that are more clearly not on the Z axis. He’s just showing that scaling it from a unit vector on Z, will still result in a line on Z, with the same magnitude. He draws it more clearly later in the video: