C1 W3 Exercise 5 Expected Results

Hello everyone,

Summary

I’m stuck with the fifth exercise compute_pca. I see that it is troubling others as well and I’m unable to find enough help in these forums. My expected output is incorrect

The problem:

When I execute the following cell

# Testing your function
np.random.seed(1)
X = np.random.rand(3, 10)
X_reduced = compute_pca(X, n_components=2)
print("Your original matrix was " + str(X.shape) + " and it became:")
print(X_reduced)

My result is

[[ 0.23132424  0.43767745]
 [ 0.2177235  -0.56404087]
 [-1.0581947  -0.05521575]]

What I have tried:

  1. Verified that I “de-meaned” the input data (X - mean) resulting in array of shape (3, 10) for this example
  2. Computed the covarance using numpy.cov(m, rowvar=False) resulting in a (10, 10) array for this example
  3. Computed the eigenvalues and eigenvectors
  4. Computed the sorted indices
  5. Extracted the first n_components sorted eigenvectors $$U_{n_components}$$ using the [:, 0: n_components] slice syntax
  6. Computed the matrix multiplication $$X^\prime = X_{demeaned} U_{n_components}$$ resulting in a (3, 2) array

My final figure is the following.

image

Furthermore, these are input and intermediate values

X (the data)

  • Shape: (3, 10)
  • Value: [[4.17022005e-01 7.20324493e-01 1.14374817e-04 3.02332573e-01 1.46755891e-01 9.23385948e-02 1.86260211e-01 3.45560727e-01 3.96767474e-01 5.38816734e-01] [4.19194514e-01 6.85219500e-01 2.04452250e-01 8.78117436e-01 2.73875932e-02 6.70467510e-01 4.17304802e-01 5.58689828e-01 1.40386939e-01 1.98101489e-01] [8.00744569e-01 9.68261576e-01 3.13424178e-01 6.92322616e-01 8.76389152e-01 8.94606664e-01 8.50442114e-02 3.90547832e-02 1.69830420e-01 8.78142503e-01]]

X_demeaned

  • Shape: (3, 10)
  • Value [[-0.01842585 0.28487664 -0.43533348 -0.13311528 -0.28869196 -0.34310926 -0.24918764 -0.08988713 -0.03868038 0.10336888] [-0.01625334 0.24977165 -0.2309956 0.44266958 -0.40806026 0.23501966 -0.01814305 0.12324197 -0.29506092 -0.23734636] [ 0.36529671 0.53281372 -0.12202368 0.25687476 0.4409413 0.45915881 -0.35040364 -0.39639307 -0.26561743 0.44269465]]

Covariance Matrix of X_demeaned

  • Shape: (10, 10)
  • Value: [[ 4.88046950e-02 3.38429177e-02 2.70410354e-02 1.33348089e-02 1.00609001e-01 6.57707762e-02 -2.75184938e-02 -5.25695002e-02 -1.27339493e-02 6.48227388e-02] [ 3.38429177e-02 2.38029956e-02 1.68919133e-02 3.98205302e-03 7.08994547e-02 4.03429340e-02 -2.12082910e-02 -3.84257778e-02 -6.48868839e-03 4.80954113e-02] [ 2.70410354e-02 1.68919133e-02 2.52986469e-02 3.65993232e-02 4.94545211e-02 6.56528268e-02 -3.45131361e-03 -1.81844337e-02 -2.00468908e-02 1.84664070e-02] [ 1.33348089e-02 3.98205302e-03 3.65993232e-02 8.63566931e-02 9.67985927e-03 1.00685091e-01 2.58818405e-02 1.66212907e-02 -4.02656116e-02 -3.16988513e-02] [ 1.00609001e-01 7.08994547e-02 4.94545211e-02 9.67985927e-03 2.11236189e-01 1.17774282e-01 -6.39199532e-02 -1.15041459e-01 -1.83299257e-02 1.44268308e-01] [ 6.57707762e-02 4.03429340e-02 6.56528268e-02 1.00685091e-01 1.17774282e-01 1.71350909e-01 -3.68356893e-03 -3.98590657e-02 -5.39476528e-02 3.79461186e-02] [-2.75184938e-02 -2.12082910e-02 -3.45131361e-03 2.58818405e-02 -6.39199532e-02 -3.68356893e-03 2.90038970e-02 4.21533131e-02 -7.67476391e-03 -5.65027401e-02] [-5.25695002e-02 -3.84257778e-02 -1.81844337e-02 1.66212907e-02 -1.15041459e-01 -3.98590657e-02 4.21533131e-02 6.82317479e-02 -6.40769360e-05 -8.83324737e-02] [-1.27339493e-02 -6.48868839e-03 -2.00468908e-02 -4.02656116e-02 -1.83299257e-02 -5.39476528e-02 -7.67476391e-03 -6.40769360e-05 1.96830541e-02 5.06165683e-03] [ 6.48227388e-02 4.80954113e-02 1.84664070e-02 -3.16988513e-02 1.44268308e-01 3.79461186e-02 -5.65027401e-02 -8.83324737e-02 5.06165683e-03 1.15614106e-01]]

{Moderator’s Edit: Please mention Lab ID only when explicitly asked by someone}

Discovery

The first and second hints for the exercise were wrong in my experience

Hints 1 and 2

 * Use numpy.mean(a,axis=None) : If you set axis = 0, you take the mean for each column. If you set axis = 1, you take the mean for each row. Remember that each row is a word vector, and the number of columns are the number of dimensions in a word vector. 
 * Use numpy.cov(m, rowvar=True) . This calculates the covariance matrix. By default rowvar is True. From the documentation: "If rowvar is True (default), then each row represents a variable, with observations in the columns." In our case, each row is a word vector observation, and each column is a feature (variable). 

As soon as I changed the mean calculation numpy.mean(a, axis=0) and covariance calculation to numpy.cov(m, rowvar=False), the unit tests passed!

Hey @MattHo,
Welcome, and we are glad that you could be a part of our community :partying_face: Thanks a lot for letting us know that your issue has been resolved. As for the hints, they mention the functions with their default hyper-parameters. If you read the hints completely, you will find that it has been highlighted explicitly. I hope this helps.

Cheers,
Elemento

Hello @Elemento ,

As for the hints, they mention the functions with their default hyper-parameters. If you read the hints completely, you will find that it has been highlighted explicitly. I hope this helps.

Thank you for attempting to clarify the problem. I appreciate your time. Now that you explained the ‘highlight’ was intended to provide context, I should understand future hints better.

Let me critique your response as I believe the hint instructions are pedagogically flawed. I read the hints completely and multiple times and it was not clear that these were the ‘default’. Also it was not clear that the ‘highlight’ was intended to indicate for the student to vary the default. I suggest a simple rewrite like so where I use an ellipsis to indicate do not change anything else.

* Use numpy.mean which takes one required parameter. You need to specify the optional argument axis for this exercise : If you set axis = 0, [...]  in a word vector.
* Use numpy.cov which takes one required parameter. You need to specify the optional argument rowvar for this exercise. This calculates the  [...] feature (variable). 

Hey @MattHo,
Thanks a lot for the follow-up. Let me pass your suggestions to the team, and they will update the hints as they deem fit to be best for the learners.

Cheers,
Elemento

Hey @MattHo,
The hints have been modified for easier interpretability. Thanks once again for your feedback.

Cheers,
Elemento