Deep n-grams (in Course 3: Natural Language Processing with Sequence Models week 4) vs Word Embedding (built by CBOWs in Course 2 Natural Language Processing with Probabilistic Models week 4))

Hello, the Natural Language Processing with Sequence Models discussed the benefit of deep n-grams over statistical n-grams such as:

  • Reduce memory and disk space consumption if you have a large corpus.
  • GRU and LSTM n-grams, overpower traditional RNN n-grams by being able to capture longer dependency.

However, a comparison between deep (GRU, LSTM) n-grams and word embeddings (build by CBOWs in Course 2: Natural Language Processing with Probabilistic Models, week 4) in terms of language model was not discussed. I do not know when to choose which method to build a more sophisticated language model.

Could someone give some insights/advice?

Hi @Hung_Nguyen1

I think you misunderstand what are the word embeddings because it’s not about “deep(GRU, LSTM) n-grams against word embeddings”.

Here are some threads about word embeddings:

And also some threads about RNNs:


Thank you for spending your time @arvyzukai .
I understand that word embedding is a way to represent text data to a machine-readable form. I modified the question because my wording might cause confusion.
In sum, what I mean is Word Embeddings (built by CBOWs) is a Language Model and Deep N-grams is also a language model, too. Could you give a comparison between them?

I’m happy to help @Hung_Nguyen1

A language model is a probabilistic model of a natural language that can generate probabilities of a series of words, based on text corpora it was trained on.
So yes - both, pure statistical models based on word n-grams" (one of them is CBOW) and Recurrent Neural Network based language models, are language models.

By comparing them by “sophistication”, RNN based language models are superior to the pure (classic) statistical models. Transformers, on the other hand, are even more sophisticated language models than RNN based.
In simple words:

  • Transformers > RNN > N-gram

The word embeddings is an integral part of the language model, meaning that it’s difficult to compare them outside the model (“their goal” is to fit the loss function and not to “look good” in PCA).
In contrast to statistical models, RNNs’ and Transformers’ word embeddings are continuous representations (which simply means that in a sequence, each word embedding is different depending on other words) and trying to directly compare them with CBOW word embeddings is somewhat difficult.
I found this attempt Comparing Contextual and Static Word Embeddings with Small
Philosophical Data
not too convincing but you can judge yourself.