Hi.

Could you please elaborate more on adding the end token to a sentence. I do understand that not having an end token would effect cases for total prior probability calculation. I also do understand that all given length sentences would have a summed up probabiliity = 1. However, I am unable to get the intuition behind the same (though it looks mathematically correct). How does adding just a single term towards the end solve this ?

Maybe elaborating with the before and after end token addition (with probability values) would help.

Thanks !

Hi @Karthik_Menon09

That is a good question and I think everybody arrives to their intuition in different ways. For me, the concrete calculations are the best way to get the intuition. Here is the Bigram example with length 2 and length 3 sentences:

Note, that when there is no “</s>” the probabilities sum to 1 for each length and in this case 2 in total. On the other hand, having the “</s>” token, the probabilities decay with sentence length and sum to 0.38 in this case.

Adding sentences with length 4 would not change the probabilities for sentences of length 2 and 3 without the “</s>” symbol, but would change the probabilities for sentences with the “</s>” symbol:

The only reason for that is having the third column, which changes the estimated conditional probabilities.

I hope that helps with your intuition

Cheers