Linear Interpolation

Naman_Chhibbbar · November 14, 2023, 9:01am

In week 3, one of the smoothing methods taught is linear interpolation. But in the lecture it seems like we are using the trigram probability to estimate the same trigram probability. In my opinion, the trigram probability should be estimated as a linear combination of the bigram and unigram probabilities.

Please let me know your thoughts on this.

arvyzukai · November 16, 2023, 10:29am

Hi @Naman_Chhibbbar

It really depends on your application. Sometimes one method leads to better results over the other and it’s hard to guess beforehand (and tokenization is a factor here, for example, for a subword model would care less about bigram and unigram probs, character level model would probably would not care about the bigrams and unigrams at all). So, at the end what matters which method leads to better results.

Cheers

Naman_Chhibbbar · November 16, 2023, 6:44pm

Hey @arvyzukai, thank you for replying!

I think you misunderstood my question. What I am trying to ask here is why are we estimating the trigram probability by using the trigram probability in the linear combination of lower order n-grams, as shown in the lecture.

Please let me know if I am missing something.

arvyzukai · November 17, 2023, 5:59am

Hi @Naman_Chhibbbar

Ah… I think you’re mixing-up the back-off and interpolation.

For the back-off when a trigram is missing we can use lower level n-grams to estimate the trigram probability (instead of it being 0, thus smoothing). I think this is the case you have in mind.

For the interpolation we use lower level n-grams to smooth the trigram probability. And as I mentioned, it really depends on your application if you want to do that or not. In other words, trigram probabilities that you have at the end (of training) only matter for your application because the trigram model is “crude” anyways.

But also notice, that interpolation would also work with the back-off too:

Here, if the P(w_n | w_{n-2}w_{n-1})=0 then the trigram probability would be the sum of lower level n-grams.

Cheers

Naman_Chhibbbar · November 17, 2023, 1:19pm

Got it, thanks!

Topic		Replies	Views
Confused about course demo 3 NLP with Probabilistic Models week-3	1	535	June 13, 2022
Interpolation - only when an N-gram is missing from corpus? NLP with Probabilistic Models week-3	1	11	October 12, 2024
C2_W3_lecture_nb_03_oov - smoothing NLP with Probabilistic Models week-3	6	429	July 14, 2023
C2_W3 UNQ_8 count_n_grams() NLP with Probabilistic Models week-3	5	493	November 10, 2023
Assignment 3: Language Models: Auto-Complete NLP with Probabilistic Models week-3	5	293	March 26, 2024

Linear Interpolation

Related topics