In the Emission probability formula, shouldn’t the denominator be V*epsilon instead of N*epsilon?
States are restricted to parts of speech (eg: noun, verb) and not to words in a vocabulary.
So, denominator is correct.
Thanks for the clarification! It was slightly confusing because the assignment takes calculating the emission probability by multiplying the smoothing parameter with the vocabulary instead of the set of states.
You will use smoothing as defined below:
𝑃(𝑤𝑖|𝑡𝑖)=𝐶(𝑡𝑖,𝑤𝑜𝑟𝑑𝑖)+𝛼 / 𝐶(𝑡𝑖)+𝛼∗𝑁
- 𝐶(𝑡𝑖,𝑤𝑜𝑟𝑑𝑖) is the number of times 𝑤𝑜𝑟𝑑𝑖 was associated with 𝑡𝑎𝑔𝑖* in the training data (stored in
emission_counts
dictionary). - 𝐶(𝑡𝑖) is the number of times 𝑡𝑎𝑔𝑖* was in the training data (stored in
tag_counts
dictionary). - 𝑁 is the number of words in the vocabulary
- 𝛼 is a smoothing parameter.
Aneesh is right, the smoothing parameter should be multiplied by V, the size of the vocabulary, in the denominator (same as the sum index j is going up to V). Otherwise, the emission probabilities (row sums) wouldn’t add up to 1.
Also, it seems there should be some effort put into fixing and making the notation consistent between the assignments and lecture videos/notes. (I’m somewhat surprised that this hasn’t been fixed for over 2 years, given that this is both a paid and a massive OOC reaching thousands of students).
@lucas.coutinho, can you investigate the issue reported here?