C1_W3_Assignment, regarding compute_pca()

God_of_Calamity · April 24, 2024, 8:58am

Here is the compute_pca() function that i wrote, since X is of shape(m,n), i first transposed it to (n,m) and then demeaned it, using axis=0, i.e, along columns. Also since np.linalg.eigh() returns the eigen vector matrix in which the eigen vectors are along the columns, i also took that into consideration.
But i am getting “4 Tests passed 2 Tests Failed” why?

Here is my code for compute_pca():

UNQ_C5 GRADED FUNCTION: compute_pca

def compute_pca(X, n_components=2):
“”"
Input:
X: of dimension (m,n) where each row corresponds to a word vector
n_components: Number of components you want to keep.
Output:
X_reduced: data transformed in 2 dims/columns + regenerated original data
pass in: data as 2D NumPy array

START CODE HERE
{Codes removed by moderator as it is against community guidelines to post grader cell codes}

END CODE HERE
return X_reduced.T

Output:

Your original matrix was (3, 10) and it became:
[[-0.0283729 -0.34902436]
[ 0.33815561 0.5218341 ]
[-0.9307405 0.30691778]]

Test your function

w3_unittest.test_compute_pca(compute_pca)

Output:

Wrong accuracy output.
Expected: [[ 0.43437323 0.49820384]
[ 0.42077249 -0.50351448]
[-0.85514571 0.00531064]].
Got: [[-0.0283729 -0.34902436]
[ 0.33815561 0.5218341 ]
[-0.9307405 0.30691778]].
Wrong accuracy output.
Expected: [[-0.32462796 0.01881248 -0.51389463]
[-0.36781354 0.88364184 0.05985815]
[-0.75767901 -0.69452194 0.12223214]
[ 1.01698298 -0.17990871 -0.33555475]
[ 0.43313753 -0.02802368 0.66735909]].
Got: [[-0.1326253 0.09571936 -0.38617588]
[-0.14340009 -0.99171418 -0.19149589]
[-0.67081752 0.36642629 0.16134231]
[ 1.15070158 0.13396776 -0.32109384]
[ 0.57325197 -0.23778567 0.68072047]].

4 Tests passed

2 Tests failed

arvyzukai · April 24, 2024, 10:02am

Hi @God_of_Calamity

Note the Hints:

Use numpy.cov(m, rowvar=True) which takes one required parameter. You need to specify the optional argument rowvar for this exercise. This calculates the covariance matrix. By default rowvar is True. From the documentation: “If rowvar is True (default), then each row represents a variable, with observations in the columns.” In our case, each row is a word vector observation, and each column is a feature (variable).

Also do not edit code outside START CODE HERE and END CODE HERE. Note:

return X_reduced and not return X_reduced.T

Cheers

God_of_Calamity · April 24, 2024, 2:18pm

But since i did X = X.T so in my case each column will be an observation and each row represents a feature, that’s why i did:
covariance_matrix = np.cov(X_demeaned), since rowvar=True is by default.
Also i am returning X_reduce.T because (eigen_vecs_subset) is of shape (10, 2) and X_demeaned is of shape (10,3), so:
np.matmul(eigen_vecs_subset.T, X_demeaned) will be of shape (2,3) so i return the transpose which is of shape (3,2).

arvyzukai · April 25, 2024, 5:04am

I didn’t notice that your first did X = X.T. In this case you demeaned over the wrong axis (in this case, you should demean over each row).

In any case this adds unnecessary complexity (lots of transpositions without a real benefit). Setting rowvar=False is way easier than transposing matrices all over the code.

Cheers

God_of_Calamity · April 25, 2024, 2:33pm

X is of shape (m,n), i.e, each row represents a word vector and each colum is a feature. So as asked i did:
X_demeaned = (X - np.mean(X, axis=1, keepdims=True))
axis=1 calculates the mean over each row, thus from each sample we subtract its mean to demean it.

covariance_matrix = np.cov(X_demeaned, rowvar=False)
rowvar=False, since each row is a sample, with columns representing features.

But still i am getting wrong output?

arvyzukai · April 25, 2024, 3:39pm

If you now do not transpose the X (as you did previously), then the X_demeaned should be obtained by:
X - np.mean(X, axis=0)

As per Hints:

Use numpy.mean(a,axis=None) which takes one required parameter. You need to specify the optional argument axis for this exercise: If you set axis = 0, you take the mean for each column. If you set axis = 1, you take the mean for each row. Remember that each row is a word vector, and the number of columns are the number of dimensions in a word vector.

God_of_Calamity · April 25, 2024, 4:55pm

Thanks now it works.
But what i don’t understand is that, if we do:
X - np.mean(X, axis=0), we would be taking the mean of each column and subtracting the respective column from it
I find it more intuitive if we would take the mean of each row and subtract the respective row from it since the samples are stored at each row, so we would be basically removing the mean of each sample from its element/features.
Is there any reason why we do this along the columns?

paulinpaloalto · April 25, 2024, 4:58pm

The columns represent the individual features. That’s what we’re trying to normalize w.r.t. Before normalization, each feature could have a different range. That’s the point, right?

God_of_Calamity · April 25, 2024, 4:59pm

Yeah now i get it. Thanks

arvyzukai · April 26, 2024, 7:29am

I would like to add one detail for the future readers.

Technically, we do not do normalization (or standardization/z-score normalization) in this case but we “just” center the data. In other words, we just subtract the mean without dividing by standard deviation, as a result the variance (or the range) stays the same.

In vanilla tabular data we would “have” to standardize the values since the bigger valued features would dominate. So in tabular data we would standardize each feature to treat them, in loose terms, as “initially” equally important.

In this case we do not treat each feature equally important and let some features have bigger or smaller range. In other words, we allow the feature that has higher variance to have a higher chance for 1st principal component.

Cheers

Topic		Replies	Views
C1W3 Assignment, PCA NLP with Classification and Vector Spaces week-3	8	469	July 19, 2023
C1 W3 Exercise 5 Expected Results NLP with Classification and Vector Spaces week-3	7	311	May 5, 2023
C1_W3 assignment compute_pca got complexed-value eigenvectors NLP with Classification and Vector Spaces week-3	16	468	April 21, 2024
C1_W3_Assignment_help with PCA Calculations NLP with Classification and Vector Spaces week-3	7	44	October 28, 2024
Compute_pca function passes only 4/6 tests NLP with Classification and Vector Spaces week-3	4	242	July 23, 2023

C1_W3_Assignment, regarding compute_pca()

UNQ_C5 GRADED FUNCTION: compute_pca

Output:

Test your function

Output:

4 Tests passed

2 Tests failed

Related topics