Sum of probabilities of n-length sentences = 1 (1 <= n < inf)

God_of_Calamity · April 27, 2024, 12:50pm

In one of the lecture the professor says that if we use the </s> token, the sum of probabilities for n-length sequence will be = 1 (1<=n<inf).

I want to ask why would we want such a behaviour, like what would be the problem if we only had:
sum of probabilities of length-2 sentences= 1,
similarly sum of probabilities of length-3 sentences= 1 and so on.

Is it because say if we had 3 unique words in our corpus, so then the sentences that can be generated using these 3 unique words can be anywhere from length=2 to length=n, and we want the sum of all such cases to be 1.

Also why didn’t we include length-1 sentence in the above sum of probabilities because we can also generate length-1 sentences with the help of unique words present in our corpus.

Anna_Kay · April 27, 2024, 8:38pm

Hello!

To assume a sum of probabilities of length-2 sentences = 1 and a sum of probabilities of length-3 sentences = 1, is wrong from probability theory standpoint, since the total probability distribution always sums up to 1 (must sum up to 1). Which pretty much what you already figured out:

I think that length-1 sentences were just not included in the assumed distribution, although one could do that and then the sum of probabilities of all cases, including the length-1 sentences, must be 1.

Maybe length-1 sentences were not included because they are just individual words and are sort of an edge case with regards to whether they should be considered sentences or not.

Topic		Replies	Views
Probability of arbitary sentences to be 1 NLP with Probabilistic Models week-module-3	3	398	October 19, 2023
End of sentence token - visualization issue NLP with Probabilistic Models week-module-3	1	497	November 23, 2022
Video on Starting and Ending Sentences (Intuition issue) NLP with Probabilistic Models week-module-2	1	382	September 15, 2023
Starting and Ending Sentences - probability calculation NLP with Probabilistic Models week-module-3	1	602	November 4, 2021
Stuck at get_pobs NLP with Probabilistic Models week-module-1	2	525	March 8, 2022

Sum of probabilities of n-length sentences = 1 (1 <= n < inf)

Related topics