Probability of arbitary sentences to be 1

arvyzukai · October 19, 2023, 6:43am

If I understand correctly, Shubham is asking how including </s> token changes things so that probabilities of all sentence lengths sum to one.

When I was doing the course, I had the same question. For me, actual numbers are more intuitive, so I’ll try to explain it with them.

Let’s consider all the possible 2-“word” and 3-“word” sentences that can be constructed from the letters ‘a’ and ‘b’ (a very simple language ):

Here, on the left side, you can see that when we don’t have a </s> token, probabilities sum to 1 for each length, but not overall. On the right side, with the inclusion of the </s> token, the probabilities of 2-word and 3-word sentences sum to 0.38.
Note: here:
p(aa)=p(a|<s>) * p(a|a) = 0.5 * 0.5 = 0.25 # without </s>
p(aa)=p(a|<s>) * p(a|a) * p(</s>|a) = 0.5 * 0.3125 * 0.375 = ~0.06 # with </s>

I don’t have a formal proof that all lengths (including 4, 5, … million, … \infty) of sentences would sum to 1, but you can get an intuitive understanding by extending the sentences to 4 words:

Now, the probability without the </s> token is 3, while with the </s> token, the probability is 0.46 - it increased slightly. So, intuitively, you can expect a decaying increase for all sentences lengths (up to infinity).

I hope that helps

Topic		Replies	Views
Sum of probabilities of n-length sentences = 1 (1 <= n < inf) NLP with Probabilistic Models course-related , lecture-help , conceptual-question	1	180	April 27, 2024
End of sentence token - visualization issue NLP with Probabilistic Models week-3	1	495	November 23, 2022
Video on Starting and Ending Sentences (Intuition issue) NLP with Probabilistic Models week-2	1	379	September 15, 2023
Is <s> and </s> the same word? NLP with Probabilistic Models week-3	1	492	June 14, 2023
Question about example - bigram (slide 34) NLP with Probabilistic Models week-3	2	417	July 4, 2023

Probability of arbitary sentences to be 1

Related topics