Hi I am trying to do the final lab for week 3 of Natural Language Processing with Classification and Vector Spaces
Assignment: Vector Space Models | Coursera
My problem is with the final question, the compute_pca function
This is my attempt. There is something I am doing wrong about which way round each of the matrices is. The incoming X matrix for the example is of shape (3,10). I am sure that the covariance matrix I need is shape (10, 10). Just calling np.cov(X_demeaned) gives a (3, 3) covariance matrix so I am fairly confident that np.cov(X_demeaned, rowvar=False) is correct. But from then on I don’t have any confidence that the matrices or vectors are the right way round. I have to call eigen_vecs_subset.T to make the final dot product work, but the numbers it produces are wrong, so something somewhere is the wrong way round. I have tried so many different permuations of transposing things, taking columns not rows for subsets, etc, etc, but I haven’t been able to guess it. Can you help? Thank you!
Posting codes from grade cell function is against community guidelines. codes removed in violation of Code of Conduct. Kindly post screenshot of error or your output with expected output. If mentor wants to see your codes, they will ask you.
I instrumented my code for compute_pca
to show the dimensions of all the relevant objects and here’s what I see with code that passes the tests:
X.shape (3, 10)
X_demeaned.shape (3, 10)
covariance_matrix.shape (10, 10)
eigen_vals [-7.03941390e-17 -3.60417070e-17 -1.30858621e-17 -8.61317229e-19
2.07977247e-19 3.78308880e-18 1.81729034e-17 5.06232858e-17
2.50881048e-01 5.48501886e-01]
idx_sorted [0 1 2 3 4 5 6 7 8 9]
eigen_vecs_subset.shape (10, 2)
X_reduced.shape (3, 2)
Your original matrix was (3, 10) and it became:
[[ 0.43437323 0.49820384]
[ 0.42077249 -0.50351448]
[-0.85514571 0.00531064]]
Please compare that to what you are getting and let us know if that sheds any light.
2 Likes
OK yes!
First of all posting the eigen_values helped me find that I was demeaning X with the mean of X, not each row of X with the mean of that row
Secondly the problem was the sort of the eigenvalues, where I had to transpose them, sort the transposed matrix, and transpose back
And now it works, thank you
2 Likes
Hi there and thank you so much for your help in general, and for giving debugging information like this to help figure out what’s going on.
When look to emulate what you put here, I get the following:
X.shape (3, 10)
X_demeaned.shape (3, 10)
covariance_matrix.shape (10, 10)
eigen_vals [-7.03941390e-17 -3.60417070e-17 -1.30858621e-17 -8.61317229e-19
2.07977247e-19 3.78308880e-18 1.81729034e-17 5.06232858e-17
2.50881048e-01 5.48501886e-01]
idx_sorted [0 1 2 3 4 5 6 7 8 9]
eigen_vecs_subset.shape (10, 2)
X_reduced.shape (3, 2)
Your original matrix was (3, 10) and it became:
[[-0.1697529 -0.09637353]
[ 0.29692971 0.18346481]
[-0.12717681 -0.08709128]]
As far as I can tell, it’s exactly the same as yours for everything but the result, which helps me at least know through the eigen_vals and idx_sorted I’m doing it right.
All of my earlier tests when I submit are correct except the last 2 (so, I’m 8 of 10 correct), so I think I’m doing it right generally.
Also I’ve read your feedback to others and seen the issue with word_embeddings, but unless I’m mistaken, it seems like the assignment has been altered so that word_embedding is literally the name of the local dictionary passed into get_country so I don’t believe I’m in error to use that inside the get_country function.
Also, just because it occurs to me to do it:
- For my eigen_vecs, the first number in the first row is -8.62414327e-01.
- For my eigen_vec_sorted, the first number in the first row is 1.58992560e-01.
- For my eigen_vecs_subset, the first number in the first row is 0.15899256.
- My X_reduced is just made with a dot product of X_demeaned & eigen_vecs_subset.
Thank you in advance, and please let me know if there’s anything else I can provide.
Sincerely,
Matt
You’re exactly right that they issued a fix to this assignment to eliminate the common mistake of directly referencing the global variable word_embeddings.
As to the issue with compute_pca
, it looks like you must have almost everything right. I haven’t managed to come up with a theory for what could be wrong, so perhaps the better thing is just to look at your code. We can’t do that directly and not on a public thread, but please check your DMs for a message from me about how to proceed with that.
1 Like
To close the loop on the public thread: it took some serious code examination to find the issue. One subtle landmine that can be stepped on is that when sorting the eigenvector matrix by the magnitude of the eigenvalues, remember that the eigenvectors are the columns of that matrix.
1 Like