# Probability of arbitary sentences to be 1

Hi everyone,

I have a doubt in one of the lectures where the professor says that the sum of probabilities for all the sentences of arbitrary length should be equal to 1.

Could someone explain intuitively why would be expecting this?

Sum of probability is always equal to 1, no ?

For example, for a particular coin, we can’t say that it has a probability of getting a head 60% (0.6) and getting tails 50% (0.5). This would be make it 110% (1.1)

Its either

• Head 60%, tails 40%, or

The overall sum should always be equal to 100% or 1

If I understand correctly, Shubham is asking how including </s> token changes things so that probabilities of all sentence lengths sum to one.

When I was doing the course, I had the same question. For me, actual numbers are more intuitive, so I’ll try to explain it with them.

Let’s consider all the possible 2-“word” and 3-“word” sentences that can be constructed from the letters ‘a’ and ‘b’ (a very simple language ):

Here, on the left side, you can see that when we don’t have a </s> token, probabilities sum to 1 for each length, but not overall. On the right side, with the inclusion of the </s> token, the probabilities of 2-word and 3-word sentences sum to 0.38.
Note: here:
p(aa)=p(a|<s>) * p(a|a) = 0.5 * 0.5 = 0.25 # without </s>
p(aa)=p(a|<s>) * p(a|a) * p(</s>|a) = 0.5 * 0.3125 * 0.375 = ~0.06 # with </s>

I don’t have a formal proof that all lengths (including 4, 5, … million, … \infty) of sentences would sum to 1, but you can get an intuitive understanding by extending the sentences to 4 words:

Now, the probability without the </s> token is 3, while with the </s> token, the probability is 0.46 - it increased slightly. So, intuitively, you can expect a decaying increase for all sentences lengths (up to infinity).

I hope that helps

1 Like

Thanks @arvyzukai. I misunderstood what was asked here. Thanks for jumping in and providing all this information!