W3 count matrix & probability matrix question

Dennis_Sinitsky · March 6, 2024, 7:28pm

In the lecture the TA said that for bigram column is second word and row is first word. But when I look at the table for corpus “I study I learn”, it seems that column is first word of bigram and row is second word of bigram. Just look at “I learn” bigram and where I put green circles around 1 and 0. The 1 is where I is column and learn is row; and I row, learn column has 0. This is from the video called N-gram language model.
Please comment about this.
Thank you
DS

arvyzukai · March 7, 2024, 8:43am

Hi @Dennis_Sinitsky

I’m not sure I understand you. The table is organized exactly as stated:

The first green circle in the boxed green row is “I learn” (to help calculate P(“learn”|“I”) = 1/2 = 0.5

In other words, the left most column values (or “rows”) are first words, the top most row values (or “columns”) are second words in bigrams.

So for the bottom green circle there is no “learn I” that is the reason for count to be 0.

Or did I misunderstand your confusion?

Dennis_Sinitsky · March 7, 2024, 4:52pm

Yes, thank you. Originally I did not think it out clearly. Columns are marked by zero-th row word tokens, while rows are marked by zero-th column word tokens. So it all makes sense.
DS

Topic		Replies	Views
Rows or Columns? NLP with Classification and Vector Spaces week-4	3	209	April 12, 2024
C2_W3 UNQ_8 count_n_grams() NLP with Probabilistic Models week-3	5	493	November 10, 2023
POS tag denoted by 𝑖 emits the first word of the given corpus NLP with Probabilistic Models week-2	1	497	February 1, 2023
Extreme Confusion about Viterbi Forward Pass NLP with Probabilistic Models week-2	5	30	November 12, 2024
NLP with Probabilistic models - C2_W2_Assignment NLP with Probabilistic Models week-1	1	552	November 4, 2022

W3 count matrix & probability matrix question

Related topics