Does having multiple embedding in model does not cause any affect to training model?

Deepti_Prasad · September 9, 2023, 1:20pm

When I ask this question, while Prof Ng mentioned while going through embedding when the embedding is converted in two vectororised using PCA for 2D visualisation, he mentions the new values looses lot of information but easier to plot. How does this embedding make sure which data need to be vectorised and which data need to left?

Based on the above doubt, will it not effect the quality of vectorisation by loosing information? Like if I am using 4 sentences and I want to vectorise, will it loss any part of the major sentence?

lukmanaj · September 10, 2023, 3:07am

From my understanding of the lectures, using PCA could lead to some information loss but PCA is an algorithm that automatically determines the most important aspects to retain. The information lost will be less important than the that retained.
But that is just for visualising the relationship.

For computing pairwise similarities, even in the lecture, the full embeddings are used. Likewise, the full embeddings will be used for the training. So there’s no concern for information loss. At least from my understanding of the explanation in the lectures.

Deepti_Prasad · September 10, 2023, 6:55am

Can I Know about PCA algorithm so that I can understand how the algorithm decides which information to choose and loose??

lukmanaj · September 10, 2023, 7:17am

The PCA algorithm helps with taking data in many dimensions and reducing the dimensions to, say, two or three dimensions. The most important use case is in data visualization. It defines the new axes and projects the data onto the new axes, taking care to ensure minimal loss of information.
It can be implemented using Sklearn.
I will refer you to Prof. Andrew’s excellent explanation in the 3rd course of the Machine Learning Specialization.](https://www.coursera.org/learn/unsupervised-learning-recommenders-reinforcement-learning/lecture/mqAH4/pca-algorithm-optional)

agershun · September 11, 2023, 2:58am

There are many dimensionality reduction algorithms, where you can transform n-D vectors to 2D space, which is convenient for visualization: PCA, t-SNE, LDA, etc. There is a good article about it in Wiki. Probably there you can find one which fit to your needs.

E.g. my friends use t-SNE to visualize the long vectors with visiting web site journeys.

Sklearn includes Python functions for many of these methods, for example sklearn.manifold.TSNE

Topic		Replies	Views
Does embedding projector use dimensional reduction? AI Discussions	6	176	January 30, 2023
PCA interpretation over word embeddings NLP with Probabilistic Models week-module-4	3	526	July 27, 2023
PCA features plotted figure NLP with Classification and Vector Spaces week-module-3	2	306	December 31, 2022
C3-week2-PCA, Principal component analysis vs regularization Unsupervised Learning, Recommenders, Reinforcement week-module-2	1	23	January 7, 2025
Is PCA used to reduce Multicollinearity? AI Discussions	3	129	January 12, 2024

Does having multiple embedding in model does not cause any affect to training model?

Related topics