Confused about definition of probability of bigram

ajeancharles · July 26, 2023, 9:58pm

Hi,

Why is the definition a conditional probability?

Why not an unconditional definition,

You don’t split a unigram into its constituent letters.
P(last letter in a word| sequence the letters before the last letter) = Count(Word in question)/Count(sequence the letters before the last letter), for a unigram.

arvyzukai · July 27, 2023, 6:15am

Hi @ajeancharles

Because you want conditional probability -P(am|I) - what is the probability of word “am” given that the previous word is “I”.
P(“I am”) is something different - at least your formula would result in what is the probability of “I am” between all bigrams (would you count unique values of “I am” and all bigrams or not?).

In other words, when you model the language, you want to know what is the conditional probability given some text. What am I going to say ___? or P(“__”|“What am I going to say”)

Cheers

P.S. “next” should have a high probability

ajeancharles · July 27, 2023, 12:50pm

Thanks. Good last point, You are trying to generate the next word from the prefix of strings that precede it. Thus it is conditional on the prefix string.
Thanks

Topic		Replies	Views
Starting and Ending Sentence Lecture - End of sentence, typo? NLP with Probabilistic Models week-module-3	1	512	June 13, 2022
Need clarity on the probability of trigram with the help of a simple example NLP with Probabilistic Models week-module-3	1	379	September 25, 2023
Question about example - bigram (slide 34) NLP with Probabilistic Models week-module-3	2	418	July 4, 2023
Markov assumption for Sequence Probabilities NLP with Probabilistic Models	1	267	December 24, 2021
C2_W3 UNQ_8 count_n_grams() NLP with Probabilistic Models week-module-3	5	493	November 10, 2023

Confused about definition of probability of bigram

Related topics