When I ask this question, while Prof Ng mentioned while going through embedding when the embedding is converted in two vectororised using PCA for 2D visualisation, he mentions the new values looses lot of information but easier to plot. How does this embedding make sure which data need to be vectorised and which data need to left?
Based on the above doubt, will it not effect the quality of vectorisation by loosing information? Like if I am using 4 sentences and I want to vectorise, will it loss any part of the major sentence?
From my understanding of the lectures, using PCA could lead to some information loss but PCA is an algorithm that automatically determines the most important aspects to retain. The information lost will be less important than the that retained.
But that is just for visualising the relationship.
For computing pairwise similarities, even in the lecture, the full embeddings are used. Likewise, the full embeddings will be used for the training. So there’s no concern for information loss. At least from my understanding of the explanation in the lectures.
Can I Know about PCA algorithm so that I can understand how the algorithm decides which information to choose and loose??
The PCA algorithm helps with taking data in many dimensions and reducing the dimensions to, say, two or three dimensions. The most important use case is in data visualization. It defines the new axes and projects the data onto the new axes, taking care to ensure minimal loss of information.
It can be implemented using Sklearn.
I will refer you to Prof. Andrew’s excellent explanation in the 3rd course of the Machine Learning Specialization.](https://www.coursera.org/learn/unsupervised-learning-recommenders-reinforcement-learning/lecture/mqAH4/pca-algorithm-optional)
There are many dimensionality reduction algorithms, where you can transform n-D vectors to 2D space, which is convenient for visualization: PCA, t-SNE, LDA, etc. There is a good article about it in Wiki. Probably there you can find one which fit to your needs.
E.g. my friends use t-SNE to visualize the long vectors with visiting web site journeys.
Sklearn includes Python functions for many of these methods, for example sklearn.manifold.TSNE
2 Likes