Why is GloVe not too expensive?

In the glove formula, we have a double summation. In the given example with the dictionary of 10,000 words, we are talking about a hundred million terms. Why is this model not too expensive to optimize?

Hello Meir,

It is a good practice to always think about the performance of our models, so your question makes totally sense. I am not an expert but I will share my thoughts here with you. Regarding NLP models, they all need to train and to process high volumes of text data, but the difference is on how they extract the knowledge from the corpus. GloVe transforms the corpus into some kind of mathematical multidimensional space, where you do not work with words anymore but embeddings. However think now about other models such as BERT or GPT-2 which are complex Neural Network Architectures with Billions of paremeters in some cases. It is less expensive trying to optimize a GloVe model rather than a GPT-2 one, in particular regarding computer time and hardware resources that you will need.

I recommend you to read this interesting article, explaining many details of the model.

Hope this gave you more context.



Hi Rosa,
Thank you for your input.
I am still not sure I understand why this is different from word2vec, where we were concerned about the expensive summation in SoftMax and introduced Negative Sampling just for that reason…

Hi Meir,

Ah then your question was “why is GloVe less expensive than Word2Vec?”. I think the answer is here. So you were right that in terms of embeddings performance they are similar, but the key is the implementation.

Hope this helps :slight_smile: