Why do we have to only use the first word in the corpus and not any other word? What is the reason behind it?
The answer lies in the description of
best_probs itself. Taking reference from the notebook itself:
best_probs: Each cell contains the probability of going from one POS tag to a word in the corpus.
Column zero of
best_probsis initialized with the assumption that the first word of the corpus was preceded by a start token (“–s–”).
The reasoning is pretty simple. The
best_probs is a matrix of dimension
(num_tags, len(corpus)), so each column, contains entries associated with a single word of the corpus only. Thus, the first column (denoted by index 0), contains the entries associated with the first word of the corpus, and in this exercise, we are only initialising the 0th column of
best_probs. I hope this helps. Feel free to ask, if you still feel any confusion in this.