Deep n-grams (in Course 3: Natural Language Processing with Sequence Models week 4) vs Word Embedding (built by CBOWs in Course 2 Natural Language Processing with Probabilistic Models week 4))

Hung_Nguyen1 · September 5, 2023, 1:45am

Hello, the Natural Language Processing with Sequence Models discussed the benefit of deep n-grams over statistical n-grams such as:

Reduce memory and disk space consumption if you have a large corpus.
GRU and LSTM n-grams, overpower traditional RNN n-grams by being able to capture longer dependency.

However, a comparison between deep (GRU, LSTM) n-grams and word embeddings (build by CBOWs in Course 2: Natural Language Processing with Probabilistic Models, week 4) in terms of language model was not discussed. I do not know when to choose which method to build a more sophisticated language model.

Could someone give some insights/advice?

arvyzukai · September 5, 2023, 6:14am

Hi @Hung_Nguyen1

I think you misunderstand what are the word embeddings because it’s not about “deep(GRU, LSTM) n-grams against word embeddings”.

Here are some threads about word embeddings:

And also some threads about RNNs:

Cheers

Hung_Nguyen1 · September 5, 2023, 10:19am

Thank you for spending your time @arvyzukai .
I understand that word embedding is a way to represent text data to a machine-readable form. I modified the question because my wording might cause confusion.
In sum, what I mean is Word Embeddings (built by CBOWs) is a Language Model and Deep N-grams is also a language model, too. Could you give a comparison between them?

arvyzukai · September 5, 2023, 12:27pm

I’m happy to help @Hung_Nguyen1

A language model is a probabilistic model of a natural language that can generate probabilities of a series of words, based on text corpora it was trained on.
So yes - both, pure statistical models based on word n-grams" (one of them is CBOW) and Recurrent Neural Network based language models, are language models.

By comparing them by “sophistication”, RNN based language models are superior to the pure (classic) statistical models. Transformers, on the other hand, are even more sophisticated language models than RNN based.
In simple words:

Transformers > RNN > N-gram

The word embeddings is an integral part of the language model, meaning that it’s difficult to compare them outside the model (“their goal” is to fit the loss function and not to “look good” in PCA).
In contrast to statistical models, RNNs’ and Transformers’ word embeddings are continuous representations (which simply means that in a sequence, each word embedding is different depending on other words) and trying to directly compare them with CBOW word embeddings is somewhat difficult.
I found this attempt Comparing Contextual and Static Word Embeddings with Small
Philosophical Data not too convincing but you can judge yourself.

Cheers

Topic		Replies	Views
Word Embedding and LSTM what is the diffrence? Sequence Models coursera-platform	1	528	February 12, 2023
Week2 - Learning Word Embeddings Sequence Models coursera-platform	2	538	August 7, 2022
Programming Assignment: Named Entity Recognition (NER) NLP with Sequence Models week-3	2	475	August 11, 2023
Why using LSTMs for NER NLP with Sequence Models week-3	1	510	November 30, 2022
About C5W3 CRNN model Sequence Models coursera-platform	3	481	May 14, 2023

Deep n-grams (in Course 3: Natural Language Processing with Sequence Models week 4) vs Word Embedding (built by CBOWs in Course 2 Natural Language Processing with Probabilistic Models week 4))

Related topics