Hello,
May I know when we did scatter plot for PCA tranformed data, normally what do Machine Learning Engineer check from the plotted graph? For example, do people check whether the 2 features are overlapping or not? From the week 3’s lab “C1_W3_lecture_nb_03_pca”, I am not sure what do I need to check from the graph? Is there any important information from the graph?
Thank you.
Hey @Jasmine3,
In NLP, the most common thing that we use PCA is for evaluating our word embeddings, which you will learn more about in Course 2 Week 4. There are usually 2 kinds of evaluations that we can perform, intrinsic and extrinsic evaluations, of which PCA serves as a method to perform intrinsic evaluation alongside other methods.
Without going into too much details, let me try to highlight the key-point. Suppose, we have embeddings of say 100 elements each, i.e., each word has a 100-dimensionality embedding. Using PCA, we can transform these embeddings into 2/3 dimensions, and then use a scatter plot to verify whether similar words’ embeddings are clustered together or not. If yes, then we can say that our embeddings are good to some extent and vice-versa.
Off-course there are other methods as well such as t-SNE (T-distributed Stochastic Neighbor Embedding), using which we can achieve the same.
Additionally, at times, PCA might not be able to produce meaningful representations of the word embeddings (such as overlapping representations as you mentioned), hence, we don’t rely on the intrinsic evaluation solely, we use extrinsic evaluations as well.
As for the ungraded lab, the major insights that you can take is how PCA works under the hood, for instance, the rotation matrix. Also, we can understand that if we project our 2D data on 1 dimension, how will it look depending on which axis we choose, and so on.

I hope this helps.
Cheers,
Elemento
1 Like
Hi there,
in addition to @Elemento‘s very good answer:
In general, a PCA can also help in your Data Preparation to:
- evaluate if your features possess redundant information (resp. also the gradual information per additional dimension see next point)
- transform features to space, spanned by the principal components (which usually corresponds to a reduction of the dimensionality)
It can be done either in an embedding space or feature space.
Best regards
Christian