C1_W3_Assignment, regarding compute_pca()

Here is the compute_pca() function that i wrote, since X is of shape(m,n), i first transposed it to (n,m) and then demeaned it, using axis=0, i.e, along columns. Also since np.linalg.eigh() returns the eigen vector matrix in which the eigen vectors are along the columns, i also took that into consideration.
But i am getting “4 Tests passed 2 Tests Failed” why?

Here is my code for compute_pca():


def compute_pca(X, n_components=2):
X: of dimension (m,n) where each row corresponds to a word vector
n_components: Number of components you want to keep.
X_reduced: data transformed in 2 dims/columns + regenerated original data
pass in: data as 2D NumPy array

{Codes removed by moderator as it is against community guidelines to post grader cell codes}

return X_reduced.T


Your original matrix was (3, 10) and it became:
[[-0.0283729 -0.34902436]
[ 0.33815561 0.5218341 ]
[-0.9307405 0.30691778]]

Test your function



Wrong accuracy output.
Expected: [[ 0.43437323 0.49820384]
[ 0.42077249 -0.50351448]
[-0.85514571 0.00531064]].
Got: [[-0.0283729 -0.34902436]
[ 0.33815561 0.5218341 ]
[-0.9307405 0.30691778]].
Wrong accuracy output.
Expected: [[-0.32462796 0.01881248 -0.51389463]
[-0.36781354 0.88364184 0.05985815]
[-0.75767901 -0.69452194 0.12223214]
[ 1.01698298 -0.17990871 -0.33555475]
[ 0.43313753 -0.02802368 0.66735909]].
Got: [[-0.1326253 0.09571936 -0.38617588]
[-0.14340009 -0.99171418 -0.19149589]
[-0.67081752 0.36642629 0.16134231]
[ 1.15070158 0.13396776 -0.32109384]
[ 0.57325197 -0.23778567 0.68072047]].

4 Tests passed

2 Tests failed

Hi @God_of_Calamity

Note the Hints:

Use numpy.cov(m, rowvar=True) which takes one required parameter. You need to specify the optional argument rowvar for this exercise. This calculates the covariance matrix. By default rowvar is True. From the documentation: “If rowvar is True (default), then each row represents a variable, with observations in the columns.” In our case, each row is a word vector observation, and each column is a feature (variable).

Also do not edit code outside START CODE HERE and END CODE HERE. Note:

  • return X_reduced and not return X_reduced.T


But since i did X = X.T so in my case each column will be an observation and each row represents a feature, that’s why i did:
covariance_matrix = np.cov(X_demeaned), since rowvar=True is by default.
Also i am returning X_reduce.T because (eigen_vecs_subset) is of shape (10, 2) and X_demeaned is of shape (10,3), so:
np.matmul(eigen_vecs_subset.T, X_demeaned) will be of shape (2,3) so i return the transpose which is of shape (3,2).

I didn’t notice that your first did X = X.T. In this case you demeaned over the wrong axis (in this case, you should demean over each row).

In any case this adds unnecessary complexity (lots of transpositions without a real benefit). Setting rowvar=False is way easier than transposing matrices all over the code.


X is of shape (m,n), i.e, each row represents a word vector and each colum is a feature. So as asked i did:
X_demeaned = (X - np.mean(X, axis=1, keepdims=True))
axis=1 calculates the mean over each row, thus from each sample we subtract its mean to demean it.

covariance_matrix = np.cov(X_demeaned, rowvar=False)
rowvar=False, since each row is a sample, with columns representing features.

But still i am getting wrong output?

If you now do not transpose the X (as you did previously), then the X_demeaned should be obtained by:
X - np.mean(X, axis=0)

As per Hints:

Use numpy.mean(a,axis=None) which takes one required parameter. You need to specify the optional argument axis for this exercise: If you set axis = 0, you take the mean for each column. If you set axis = 1, you take the mean for each row. Remember that each row is a word vector, and the number of columns are the number of dimensions in a word vector.

1 Like

Thanks now it works.
But what i don’t understand is that, if we do:
X - np.mean(X, axis=0), we would be taking the mean of each column and subtracting the respective column from it
I find it more intuitive if we would take the mean of each row and subtract the respective row from it since the samples are stored at each row, so we would be basically removing the mean of each sample from its element/features.
Is there any reason why we do this along the columns?

The columns represent the individual features. That’s what we’re trying to normalize w.r.t. Before normalization, each feature could have a different range. That’s the point, right?

Yeah now i get it. Thanks

I would like to add one detail for the future readers.

Technically, we do not do normalization (or standardization/z-score normalization) in this case but we “just” center the data. In other words, we just subtract the mean without dividing by standard deviation, as a result the variance (or the range) stays the same.

In vanilla tabular data we would “have” to standardize the values since the bigger valued features would dominate. So in tabular data we would standardize each feature to treat them, in loose terms, as “initially” equally important.

In this case we do not treat each feature equally important and let some features have bigger or smaller range. In other words, we allow the feature that has higher variance to have a higher chance for 1st principal component.