I want to suggest some changes in the code for “smoothing” part in the lab notebook as below. I would be happy to hear any thoughts.
First, I think there was unintended typo in the function name. There was missing ‘h’ in “smoothing”, so I suggest to change the function name: add_k_smooting_probability → add_k_smoothing_probability
Second, in the code below from the lecture notebook
Third, in a code to compute probability_unknown_trigram, I think passing bigram_probabilities[('i', 'am')] as n_gram_prefix_count argument makes more sense.
Great job for spotting a typo, I will submit it for fixing.
Regarding this point - no, on the contrary - we want to predict the next word. For example, if we have a sentence “i am happy ___to___ learn”, and we do not have the ('i', 'am', 'happy') tri-gram in our table, then we should should use ('am', 'happy') bi-gram instead but not ('i', 'am') since we are trying to predict the word “___to___” (or other words for that matter).
First, I want to clarify that my question is about Add-k smoothing method, not back-off method and not interpolation method.
My understanding is that the purpose of add_k_smoothing_probability() function is to compute probability of observing ('i', 'am', 'happy') trigram conditioning on observing bigram ('i', 'am'). So, here, 'happy is the next word of interest, and the function is to compute P(('i', 'am', 'happy') | ('i', 'am')) even when ('i', 'am', 'happy') trigram does not exist in the training corpus.
Do I correctly understand the purpose of the function add_k_smoothing_probability()?
Oh, in that case I think you’re correct. But I cannot verify it since I’m on vacation and verification of code using phone is problematic. I’ll be back after a week, in the mean time, maybe someone else will clarify the situation?
Yes, you are correct in this. Also, note that the function is not used only when the trigram probabilities are absent, but also, when they are present. If we apply add-k-smoothing to estimate the probability of any one of the trigrams, then the same is done to estimate the probability of other trigrams as well, since then only, we can compare 2 trigrams, as in, which trigram is more likely, given the same bigram probabilities.
As to these, yes you are correct indeed. Thanks a lot for pointing out these discrepancies. Let me raise an issue to get these fixed.