Difference between word2vec and Transformers (and GloVe and BERT)?

Could you please explain what the key differences are? In simpler words.

Hi @Oleksandra_Sopova,

I shall try my best.

In simple terms:

word2vec, as the name tells you, helps in creating vectors of words. This is not context dependent.

For example, for the word “Sydney”, consider these sentences.

  • This is in Sydney.
  • This is Sydney.

A word2vec will always map the word “Sydney” to specific value, let’s say that value is 1034. Even though, in the first sentence we know Sydney is being considered as the city and in the second it is someone’s name, but the word2vec will always output one specific value, 1034, for it. Hence, not context dependent.

Transformers on the other hand, are context dependent. A Transformer will look at the other words around the considered word, which in our case is “Sydney”, and then give us a context-dependent output.

You can read this article to get a better understanding.