Sum of probabilities of n-length sentences = 1 (1 <= n < inf)


In one of the lecture the professor says that if we use the </s> token, the sum of probabilities for n-length sequence will be = 1 (1<=n<inf).

I want to ask why would we want such a behaviour, like what would be the problem if we only had:
sum of probabilities of length-2 sentences= 1,
similarly sum of probabilities of length-3 sentences= 1 and so on.

Is it because say if we had 3 unique words in our corpus, so then the sentences that can be generated using these 3 unique words can be anywhere from length=2 to length=n, and we want the sum of all such cases to be 1.

Also why didn’t we include length-1 sentence in the above sum of probabilities because we can also generate length-1 sentences with the help of unique words present in our corpus.

Hello!

To assume a sum of probabilities of length-2 sentences = 1 and a sum of probabilities of length-3 sentences = 1, is wrong from probability theory standpoint, since the total probability distribution always sums up to 1 (must sum up to 1). Which pretty much what you already figured out:

I think that length-1 sentences were just not included in the assumed distribution, although one could do that and then the sum of probabilities of all cases, including the length-1 sentences, must be 1.

Maybe length-1 sentences were not included because they are just individual words and are sort of an edge case with regards to whether they should be considered sentences or not.