In the nlp course I learnt embedding layer is used to present a word in n dimensional space it could be any natural number. But we can visualize 3d, so the https://projector.tensorflow.org/ project nd data to 3d right? If yes does this mean some information would be lost?
Hello @tbhaxor,
My answer is yes and yes.
From here we can see that it first uses PCA to find those principal “components”, where each component carries some percentage of the total variance. Then we can select up to 3 components to represent a data point. So, we are going to lose the information that the other components carry.
Raymond
Hi @rmwkwok @tbhaxor ,
You are absolutely right. If we use PCA, T_SNE, etc… there will be a loss of information because we are trying to capture the maximum variance of the data using those methods. It is a trade off between having hundreds of variables to deal with and not being able to visualize them, vs having fewer axis/principle components to deal with. So complexity vs simplicity with a bit of loss is worth it.
In addition to the very good previous answers, I want to add a few things due to completeness reasons:
Often yes (probably 99% of all examples I saw in data science), but only if the dimensional space is lower than before in the original space. So this statement is not true without exception…
Let’s take a PCA. It’s closely related to the implementation of a singular value decomposition: one application here is a modal transformation in structural dynamics, which is done to decouple interactions in the system so that it is easier to analyse e.g. eigenfrequencies, eigenmodes etc. in a simpler way.
(Often of course the benefit of model order reduction is used in this context, too! Almost the same accuracy with a way better computational performance can be achieved if done well which is often the way to go). But I want to highlight that singular value decomposition (SVD) or PCA could also be theoretically performed in the full original space without loss of information because in the end only a linear transformation is done.
If you are interested in structural dynamics, feel free to take a look.
Hope that helps!
Best regards
Christian
Hi @Christian_Simonis,
In your opinion and experience, what the best dimensionality reduction algorithm to use: PCA, LDA, SVD, T-SNE, UMAP, etc…
I believe this really depends on what you want to achieve. Do you have a concrete example (e.g. a visualization or anomaly detection, etc…) what you want to do after doing the reduction of the feature space?
Since you asked for my personal experience:
Personally, I think PCA is great since it helps for data understanding and when it comes to transformations it’s interpretable due to linearity operations, but things are getting more difficult when you want to deal with non-linearity. Kernel PCA might help here but if you have really large amount of data you could in this case also think about a deep learning based approach to learn your embeddings, e.g. with autoencoders, which can be very effective!
Note: e.g. in case if only a small amount of labels is available, sometimes Siamese networks can be very powerful to learn embeddings dependent on what you want to achieve and if you fulfill the data requirements.
I also played around with some other methods you outlined like t-sne, but only for visualization purposes.
Best regards
Christian
I agree with what you said in there. I mostly use PCA and kernel PCA when I deal with data that has less than 1,000 columns. I haven’t worked on bigger datasets. Datasets that I work on are usually numerical, so I am not sure if auto-encoders would be a good option. Thanks!