Natural Language Processing with Classification and Vector Spaces

cavhind123 · January 28, 2025, 5:57pm

Hi I am trying to do the final lab for week 3 of Natural Language Processing with Classification and Vector Spaces

Assignment: Vector Space Models | Coursera

My problem is with the final question, the compute_pca function

This is my attempt. There is something I am doing wrong about which way round each of the matrices is. The incoming X matrix for the example is of shape (3,10). I am sure that the covariance matrix I need is shape (10, 10). Just calling np.cov(X_demeaned) gives a (3, 3) covariance matrix so I am fairly confident that np.cov(X_demeaned, rowvar=False) is correct. But from then on I don’t have any confidence that the matrices or vectors are the right way round. I have to call eigen_vecs_subset.T to make the final dot product work, but the numbers it produces are wrong, so something somewhere is the wrong way round. I have tried so many different permuations of transposing things, taking columns not rows for subsets, etc, etc, but I haven’t been able to guess it. Can you help? Thank you!

Posting codes from grade cell function is against community guidelines. codes removed in violation of Code of Conduct. Kindly post screenshot of error or your output with expected output. If mentor wants to see your codes, they will ask you.

paulinpaloalto · January 28, 2025, 7:19pm

I instrumented my code for compute_pca to show the dimensions of all the relevant objects and here’s what I see with code that passes the tests:

X.shape (3, 10)
X_demeaned.shape (3, 10)
covariance_matrix.shape (10, 10)
eigen_vals [-7.03941390e-17 -3.60417070e-17 -1.30858621e-17 -8.61317229e-19
  2.07977247e-19  3.78308880e-18  1.81729034e-17  5.06232858e-17
  2.50881048e-01  5.48501886e-01]
idx_sorted [0 1 2 3 4 5 6 7 8 9]
eigen_vecs_subset.shape (10, 2)
X_reduced.shape (3, 2)
Your original matrix was (3, 10) and it became:
[[ 0.43437323  0.49820384]
 [ 0.42077249 -0.50351448]
 [-0.85514571  0.00531064]]

Please compare that to what you are getting and let us know if that sheds any light.

cavhind123 · January 29, 2025, 9:41am

OK yes!

First of all posting the eigen_values helped me find that I was demeaning X with the mean of X, not each row of X with the mean of that row

Secondly the problem was the sort of the eigenvalues, where I had to transpose them, sort the transposed matrix, and transpose back

And now it works, thank you

mrjohnson · June 24, 2025, 1:51am

Hi there and thank you so much for your help in general, and for giving debugging information like this to help figure out what’s going on.

When look to emulate what you put here, I get the following:

X.shape (3, 10)
X_demeaned.shape (3, 10)
covariance_matrix.shape (10, 10)
eigen_vals [-7.03941390e-17 -3.60417070e-17 -1.30858621e-17 -8.61317229e-19
2.07977247e-19 3.78308880e-18 1.81729034e-17 5.06232858e-17
2.50881048e-01 5.48501886e-01]
idx_sorted [0 1 2 3 4 5 6 7 8 9]
eigen_vecs_subset.shape (10, 2)
X_reduced.shape (3, 2)
Your original matrix was (3, 10) and it became:
[[-0.1697529 -0.09637353]
[ 0.29692971 0.18346481]
[-0.12717681 -0.08709128]]

As far as I can tell, it’s exactly the same as yours for everything but the result, which helps me at least know through the eigen_vals and idx_sorted I’m doing it right.

All of my earlier tests when I submit are correct except the last 2 (so, I’m 8 of 10 correct), so I think I’m doing it right generally.

Also I’ve read your feedback to others and seen the issue with word_embeddings, but unless I’m mistaken, it seems like the assignment has been altered so that word_embedding is literally the name of the local dictionary passed into get_country so I don’t believe I’m in error to use that inside the get_country function.

Also, just because it occurs to me to do it:

For my eigen_vecs, the first number in the first row is -8.62414327e-01.
For my eigen_vec_sorted, the first number in the first row is 1.58992560e-01.
For my eigen_vecs_subset, the first number in the first row is 0.15899256.
My X_reduced is just made with a dot product of X_demeaned & eigen_vecs_subset.

Thank you in advance, and please let me know if there’s anything else I can provide.

Sincerely,
Matt

paulinpaloalto · June 24, 2025, 5:08am

You’re exactly right that they issued a fix to this assignment to eliminate the common mistake of directly referencing the global variable word_embeddings.

As to the issue with compute_pca, it looks like you must have almost everything right. I haven’t managed to come up with a theory for what could be wrong, so perhaps the better thing is just to look at your code. We can’t do that directly and not on a public thread, but please check your DMs for a message from me about how to proceed with that.

paulinpaloalto · June 24, 2025, 6:58pm

To close the loop on the public thread: it took some serious code examination to find the issue. One subtle landmine that can be stepped on is that when sorting the eigenvector matrix by the magnitude of the eigenvalues, remember that the eigenvectors are the columns of that matrix.

Topic		Replies	Views
Stuck with dimensions on compute_pca NLP with Classification and Vector Spaces week-module-3	32	888	October 4, 2023
C1 W3 Exercise 5 Expected Results NLP with Classification and Vector Spaces week-module-3	7	311	May 5, 2023
Four out of Six on compute_pca NLP with Classification and Vector Spaces week-module-3	2	389	April 4, 2022
C1_W3_Assignment, regarding compute_pca() NLP with Classification and Vector Spaces course-related , week-module-3	9	313	April 26, 2024
C1_W3_Assignment_help with PCA Calculations NLP with Classification and Vector Spaces week-module-3	7	48	October 28, 2024

Natural Language Processing with Classification and Vector Spaces

Related topics