Why can’t we just use Best_Probs, with only forward pass and quit Best_Paths & the backtracing?
Let me use the diagram given in the notebook, to clarify my point:
Suppose we use argmax on column 0, the highest probability is -14.32, thus we obtain index number 20, whose corresponding tag is NN. This tag is stored
Moving on to column 1, the highest probability is -25.13, the row index is 40, therefore we get VBZ, which is stored right after the last tag.
Finally in column 2 we get -34.99 as the highest probability. The index is 28, whose corresponding tag is RB, which is stored.
Thus we have stored the sequence NN- VBZ-RB, which is exactly the same sequence that we would get if we use backtracing as well.
So, can you please explain to me for what purpose we need backtracing too, if we get the same path even without it?
Is there any disadvantage in the procedure I described above?
Hi @Doron_Modan,
The diagram given in the notebook shows only parts of the best_probs
and best_paths
matrices, where the values with the highest probabilities happen to point to previous values with similarly high probabilities. However, this is not generally the case. When we populate these matrices using viterbi_forward()
for the i^{th} word in the corpus and the current POS tag j we compute
\displaystyle \mathrm{best\_prob}_{j, i} = \max_k \ \mathbf{best\_prob}_{k, i-1} + \mathrm{log}(\mathbf{A}_{k, j}) + \mathrm{log}(\mathbf{B}_{j, vocab(corpus_{i})}),
\displaystyle \mathrm{best\_path}_{j, i} = \mathop{\mathrm{argmax}}_k \ \mathbf{best\_prob}_{k, i-1} + \mathrm{log}(\mathbf{A}_{k, j}) + \mathrm{log}(\mathbf{B}_{j, vocab(corpus_{i})}). This process doesn’t guarantee that high-probability values will consistently point to other high-probability values in the previous step.