Difference between word2vec and Transformers (and GloVe and BERT)?

Oleksandra_Sopova · November 6, 2022, 8:43pm

Could you please explain what the key differences are? In simpler words.

Mubsi · November 7, 2022, 8:17am

Hi @Oleksandra_Sopova,

I shall try my best.

In simple terms:

word2vec, as the name tells you, helps in creating vectors of words. This is not context dependent.

For example, for the word “Sydney”, consider these sentences.

This is in Sydney.
This is Sydney.

A word2vec will always map the word “Sydney” to specific value, let’s say that value is 1034. Even though, in the first sentence we know Sydney is being considered as the city and in the second it is someone’s name, but the word2vec will always output one specific value, 1034, for it. Hence, not context dependent.

Transformers on the other hand, are context dependent. A Transformer will look at the other words around the considered word, which in our case is “Sydney”, and then give us a context-dependent output.

You can read this article to get a better understanding.

Cheers,
Mubsi

Topic		Replies	Views
Word2Vec: Confusion over the specific definition of 'context' and 'target' words Sequence Models coursera-platform	3	605	March 23, 2025
How do we get the embedding matrix from the word2vec model? Sequence Models coursera-platform	3	744	October 10, 2022
What is the difference between the two out put in the transformer NLP with Attention Models week-module-2	1	467	May 2, 2023
When to use which embedding NLP with Probabilistic Models week-module-2	2	183	June 3, 2024
What does seq2seq mean in Transformer? Generative AI with Large Language Models week-module-1 , week-module-2	2	395	April 23, 2024

Difference between word2vec and Transformers (and GloVe and BERT)?

Related topics