POS tag denoted by 𝑖 emits the first word of the given corpus

Varun_Malhotra · January 31, 2023, 1:48pm

Why do we have to only use the first word in the corpus and not any other word? What is the reason behind it?

Elemento · February 1, 2023, 6:15am

Hey @Varun_Malhotra,
The answer lies in the description of best_probs itself. Taking reference from the notebook itself:

best_probs: Each cell contains the probability of going from one POS tag to a word in the corpus.

and

Column zero of best_probs is initialized with the assumption that the first word of the corpus was preceded by a start token (“–s–”).

The reasoning is pretty simple. The best_probs is a matrix of dimension (num_tags, len(corpus)), so each column, contains entries associated with a single word of the corpus only. Thus, the first column (denoted by index 0), contains the entries associated with the first word of the corpus, and in this exercise, we are only initialising the 0th column of best_probs. I hope this helps. Feel free to ask, if you still feel any confusion in this.

Cheers,
Elemento

Topic		Replies	Views
Problem in Exercise 6 (Viterbi Forward) NLP with Probabilistic Models week-2	2	622	May 31, 2022
Problem about UNQ_C7 NLP with Probabilistic Models week-2	2	610	August 2, 2022
D matrix numbers representing what NLP with Probabilistic Models week-2	5	523	May 26, 2023
Emission Counts and Probability Matrix NLP with Probabilistic Models week-2	1	501	April 12, 2023
W3 count matrix & probability matrix question NLP with Probabilistic Models week-3	2	253	March 7, 2024

POS tag denoted by 𝑖 emits the first word of the given corpus

Related topics