I was a little confused at the very end of the lab assignment. I clearly understood the theory behind word embeddings calculation with CBOW, but I don’t know if the very last words on the final explanation make reference to the type of validation you want to perform to determine whether your embeddings are correct or not. It says ‘… we have to be careful with the interpretation of this projected word vectors, since the PCA depends on the projection – as shown in the following illustration.’
In the last section, we take the embeddings and perform PCA to do some scatter plots. At first glance, I would think that the 2D projection would be enough to intrinsically analyze the embeddings. But, can you use any other pair combination of axes of any other even PCA projection to compare the embeddings?
What I mean is, after the training we do a 2D projection on a set of previously selected words, and it looks like this:
After that, the lab mentions that we need to be careful with the interpretation of the projection, so I started playing around with combinations of axes to see how It works in every try. With a 4D projection and taking the first and the last axes, I obtained this:
So my confusion relies on what to take as ground truth when analyzing the embeddings, does it depend on what type of relation my embeddings I want to have? Does it depend on the further task I want to implement, thus I need to keep certain relations among the embeddings? How to determine what is missing or what to fine-tune to produce better embeddings?
That is a good question The most sane answer to these questions should be that you cannot take anything as a ground truth because as you can see with just playing around that you can get any result you want (for example, you can show that “king” is near to “queen”, or you can show that “king” is far from “queen”).
But these plots are not totally trash. You can use them as a complementary information (one of the many evidence for it) to confirm or reject your hypothesis but only before looking at them (and picking the one that suits you).
Another use case is exploration for interesting information. You can look at them and some ideas might come but again you have to be careful if these are spurious correlations or anything worth exploring deeper.
So to sum it up, they are not very reliable but they could be useful.
Is it possible that we are supposed to compare dimensions with highest information. So even with X=4 in PCA, comparing dim 0 and 1 will give more similar results to X=2 rather than comparing dim 2,3 or 1 and 3. Right?
PCA orders the columns by the amount of information. Column 0 has the most information, Column 1 has less but more than Column 2 and so on.
You can play around with these columns etc., but you have to be careful for what you see is not the truth but is what you want to see.