W3 count matrix & probability matrix question

In the lecture the TA said that for bigram column is second word and row is first word. But when I look at the table for corpus “I study I learn”, it seems that column is first word of bigram and row is second word of bigram. Just look at “I learn” bigram and where I put green circles around 1 and 0. The 1 is where I is column and learn is row; and I row, learn column has 0. This is from the video called N-gram language model.
Please comment about this.
Thank you
DS

Hi @Dennis_Sinitsky

I’m not sure I understand you. The table is organized exactly as stated:

The first green circle in the boxed green row is “I learn” (to help calculate P(“learn”|“I”) = 1/2 = 0.5

In other words, the left most column values (or “rows”) are first words, the top most row values (or “columns”) are second words in bigrams.

So for the bottom green circle there is no “learn I” that is the reason for count to be 0.

Or did I misunderstand your confusion?

Yes, thank you. Originally I did not think it out clearly. Columns are marked by zero-th row word tokens, while rows are marked by zero-th column word tokens. So it all makes sense.
DS

1 Like