C2_W3_lecture_nb_03_oov - smoothing

I want to suggest some changes in the code for “smoothing” part in the lab notebook as below. I would be happy to hear any thoughts.


First, I think there was unintended typo in the function name. There was missing ‘h’ in “smoothing”, so I suggest to change the function name:
add_k_smooting_probabilityadd_k_smoothing_probability


Second, in the code below from the lecture notebook

trigram_probabilities = {('i', 'am', 'happy') : 2}
bigram_probabilities = {( 'am', 'happy') : 10}

I think it makes sense to change bigram to ('i', 'am'), which is first two words of the trigram.

trigram_probabilities = {('i', 'am', 'happy') : 2}
bigram_probabilities = {( 'i', 'am') : 10}

Third, in a code to compute probability_unknown_trigram, I think passing bigram_probabilities[('i', 'am')] as n_gram_prefix_count argument makes more sense.

probability_unknown_trigram = add_k_smoothing_probabilty(k, vocabulary_size, 
    n_gram_count=0, n_gram_prefix_count=bigram_probabilities[('i', 'am')])

Hi @younglee

Great job for spotting a typo, I will submit it for fixing. :+1:

Regarding this point - no, on the contrary - we want to predict the next word. For example, if we have a sentence “i am happy ___to___ learn”, and we do not have the ('i', 'am', 'happy') tri-gram in our table, then we should should use ('am', 'happy') bi-gram instead but not ('i', 'am') since we are trying to predict the word “___to___” (or other words for that matter).

Cheers

Hi @arvyzukai

Thank you for the explanation, but I am confused.

First, I want to clarify that my question is about Add-k smoothing method, not back-off method and not interpolation method.

My understanding is that the purpose of add_k_smoothing_probability() function is to compute probability of observing ('i', 'am', 'happy') trigram conditioning on observing bigram ('i', 'am'). So, here, 'happy is the next word of interest, and the function is to compute P(('i', 'am', 'happy') | ('i', 'am')) even when ('i', 'am', 'happy') trigram does not exist in the training corpus.

Do I correctly understand the purpose of the function add_k_smoothing_probability()?

Hi @younglee

Oh, in that case I think you’re correct. But I cannot verify it since I’m on vacation and verification of code using phone is problematic. I’ll be back after a week, in the mean time, maybe someone else will clarify the situation?

Hey @younglee,

Yes, you are correct in this. Also, note that the function is not used only when the trigram probabilities are absent, but also, when they are present. If we apply add-k-smoothing to estimate the probability of any one of the trigrams, then the same is done to estimate the probability of other trigrams as well, since then only, we can compare 2 trigrams, as in, which trigram is more likely, given the same bigram probabilities.

As to these, yes you are correct indeed. Thanks a lot for pointing out these discrepancies. Let me raise an issue to get these fixed.

Cheers,
Elemento

Hey @younglee,
The discrepancies have been fixed. Once again, thanks a lot for pointing out the discrepancies.

Cheers,
Elemento

1 Like

@Elemento Thank you!