Help with multiplying eigen vectors to obtain X_reduced

Hello. I can’t seem to figure out how to my the X_reduced into the right shape. Whenever, I try to multiply the transposes, I keep getting mismatches.

Right now, I get the unit test to run with 4 passes and 2 fails (similar to others).

Any help out there?

My notebook id is: twbrijcf

1 Like

I am not a mentor for NLP and only got as far as Week 2 of NLP C1, so I can’t directly help. But just as a general matter, you’re not helping here by only speaking in generalities. You’ll need to give anyone who wants to help something more concrete to go on. Please be aware that no-one else can just look at your notebook based on the notebook ID. Only the course staff can do that and they are (generally speaking) too busy to be hanging around here answering questions.

How about telling us the shapes of the objects involved and/or showing us an actual exception trace that you are getting?

Hi @paulinpaloalto
Thanks for the advice. I wasn’t sure if I was allowed the post code on the list or not.

In the assigned function compute_pca, I believe that I am computing the eigen_vecs_sorted and eigen_vecs_subset correctly:

    # sort eigenvectors using the idx_sorted_decreasing indices
    eigen_vecs_sorted = eigen_vecs[:, idx_sorted_decreasing]

    # select the first n eigenvectors (n is desired dimension
    # of rescaled data array, or dims_rescaled_data)
    eigen_vecs_subset = eigen_vecs_sorted[:,0:2]

It is this part I can’t seem to get right:

   # transform the data by multiplying the transpose of the eigenvectors with the transpose of the de-meaned data
    # Then take the transpose of that product.
   X_reduced = np.transpose(np.dot(eigen_vecs_subset.T, X_demeaned.T))

This “works” in the sense that the output matches the example output, but 2 of my unit tests are failing. E.g.;

Wrong output shape. Check if you are taking the proper number of dimensions.
	Expected: (5, 3).
	Got: (5, 2).

Any suggestions?

Also (on a side note), does anyone know the difference between multiplying two numpy arrays vs. taking the dot product. E.g.:

 eigen_vecs_subset.T * X_demeaned.T

versus

np.dot(eigen_vecs_subset.T, X_demeaned.T)

Bill

The fact that you are asking that question about the meaning of the two forms of matrix multiply means you’re in trouble. Understanding at least basic Linear Algebra is a prerequisite to any of the DLAI courses, NLP included.

Here’s a thread from DLS that talks about this in a bit more detail and gives links to some Linear Algebra courses.

The short answer is that * or np.multiply is “elementwise” matrix multiply. The two operands must be the exact same shape (or broadcastable to the same shape, which is explained on that other thread). The result is the same shape and each element is just the product of the corresponding elements to the two operands.

np.dot is a completely different animal. That is what mathematicians call “matrix multiply”. Mathematicians call elementwise multiply the “Hadamard product”. For real matrix multiply, each element of the result is the vector dot product of one row of the first operand with one column of the second operand. So if you are computing A \cdot B then the “inner” dimensions need to match. If A is n x k and B is k x m, then the result is n x m. And if the k values don’t match, it just throws an error.

The transpose of a matrix means to reflect the elements about the “main diagonal”, which runs from upper left to lower right. If A is n x k then A^T will be k x n. So it is the case that for any matrix, A \cdot A^T and A^T \cdot A will work from a dimensionality perspective.

Thanks paulinpaloalto.
Any ideas about why my eigen vector subset is not working?

Sorry, as I mentioned earlier, I have not done that assignment yet. I might have time to take a look later today, but no guarantees. There are actual NLP mentors, so your best hope is that one of them will check in and respond to your question.

I think you should check your eigen_vecs_subset. The dimensions might not be right.

The dimensions are working out:

eigen_vecs_subset = eigen_vecs_sorted[:,0:2] # -> dim (10, 2)

# eigen_vecs_subset.T -> dim (2, 10)
# X_demeaned.T -> dim (10, 3)
# (2 x 10) x (10 x 3) -> (2 x 3) -> transpose -> (3 x 2)
X_reduced = np.transpose(np.dot(eigen_vecs_subset.T, X_demeaned.T))

The output (X_reduced) matches the example in the notebook:

[[ 0.43437323  0.49820384]
 [ 0.42077249 -0.50351448]
 [-0.85514571  0.00531064]]

However, 2 of my unit tests are failing. E.g.:

Wrong output shape. Check if you are taking the proper number of dimensions.
	Expected: (5, 3).
	Got: (5, 2).

Well, maybe the dimensions of the inputs in that other test case are different. And you’re somehow “hard-coding” something, e.g. by referencing global variables instead of the formal parameters.

That was it! I forgot to change

eigen_vecs_subset = eigen_vecs_sorted[:,0:2] # <- hard coded

to

eigen_vecs_subset = eigen_vecs_sorted[:,0:n_components]

Thanks for your help!!!

Cool! Glad to hear that you found the issue. There’s a “meta” lesson there: it’s always a mistake to “hard-code” things unless you literally have no choice. In that case they typically make a point of telling you to do the hard-coding.