Confused about definition of probability of bigram


Why is the definition a conditional probability?

Why not an unconditional definition,

You don’t split a unigram into its constituent letters.
P(last letter in a word| sequence the letters before the last letter) = Count(Word in question)/Count(sequence the letters before the last letter), for a unigram.

Hi @ajeancharles

Because you want conditional probability -P(am|I) - what is the probability of word “am” given that the previous word is “I”.
P(“I am”) is something different - at least your formula would result in what is the probability of “I am” between all bigrams (would you count unique values of “I am” and all bigrams or not?).

In other words, when you model the language, you want to know what is the conditional probability given some text. What am I going to say ___? or P(“__”|“What am I going to say”)


P.S. “next” should have a high probability :slight_smile:

Thanks. Good last point, You are trying to generate the next word from the prefix of strings that precede it. Thus it is conditional on the prefix string.