Could someone explain to me why does theta of a given word is not expected to be close to a word embedding vector of the same word after training. my logic says it’s supposed to be identical…

If this has to to do with cosine similarity which has a certain formula to compute. The word embedding on the other hand is latent space represantation generated by a model, how can they be the same!