In the nlp course I learnt embedding layer is used to present a word in n dimensional space it could be any natural number. But we can visualize 3d, so the https://projector.tensorflow.org/ project nd data to 3d right? If yes does this mean some information would be lost?

Hello @tbhaxor,

My answer is yes and yes.

From here we can see that it first uses PCA to find those principal â€ścomponentsâ€ť, where each component carries some percentage of the total variance. Then we can select up to 3 components to represent a data point. So, we are going to lose the information that the other components carry.

Raymond

Hi @rmwkwok @tbhaxor ,

You are absolutely right. If we use PCA, T_SNE, etcâ€¦ there will be a loss of information because we are trying to capture the maximum variance of the data using those methods. It is a trade off between having hundreds of variables to deal with and not being able to visualize them, vs having fewer axis/principle components to deal with. So complexity vs simplicity with a bit of loss is worth it.

In addition to the very good previous answers, I want to add a few things due to completeness reasons:

Often yes (probably 99% of all examples I saw in data science), but only if the dimensional space is lower than before in the original space. So this statement is not true without exceptionâ€¦

Letâ€™s take a PCA. Itâ€™s closely related to the implementation of a singular value decomposition: one application here is a modal transformation in structural dynamics, which is done to decouple interactions in the system so that it is easier to analyse e.g. eigenfrequencies, eigenmodes etc. in a simpler way.

(Often of course the benefit of model order reduction is used in this context, too! Almost the same accuracy with a way better computational performance can be achieved if done well which is often the way to go). But I want to highlight that singular value decomposition (SVD) or PCA could also be theoretically performed in the full original space without loss of information because in the end only a **linear transformation** is done.

If you are interested in structural dynamics, feel free to take a look.

Hope that helps!

Best regards

Christian

Hi @Christian_Simonis,

In your opinion and experience, what the best dimensionality reduction algorithm to use: PCA, LDA, SVD, T-SNE, UMAP, etcâ€¦

I believe this really depends on what you want to achieve. Do you have a concrete example (e.g. a visualization or anomaly detection, etcâ€¦) what you want to do after doing the reduction of the feature space?

Since you asked for my personal experience:

Personally, I think PCA is great since it helps for data understanding and when it comes to transformations itâ€™s interpretable due to linearity operations, but things are getting more difficult when you want to deal with non-linearity. Kernel PCA might help here but if you have really large amount of data you could in this case also think about a deep learning based approach to learn your embeddings, e.g. with autoencoders, which can be very effective!

Note: e.g. in case if only a small amount of labels is available, sometimes Siamese networks can be very powerful to learn embeddings dependent on what you want to achieve and if you fulfill the data requirements.

I also played around with some other methods you outlined like t-sne, but only for visualization purposes.

Best regards

Christian

I agree with what you said in there. I mostly use PCA and kernel PCA when I deal with data that has less than 1,000 columns. I havenâ€™t worked on bigger datasets. Datasets that I work on are usually numerical, so I am not sure if auto-encoders would be a good option. Thanks!