Possible Issue with Emission Matrix Formula

Aneesh_Bose · May 5, 2022, 1:41pm

In the Emission probability formula, shouldn’t the denominator be V*epsilon instead of N*epsilon?

balaji.ambresh · May 5, 2022, 1:58pm

States are restricted to parts of speech (eg: noun, verb) and not to words in a vocabulary.
So, denominator is correct.

Aneesh_Bose · June 6, 2022, 7:35pm

Thanks for the clarification! It was slightly confusing because the assignment takes calculating the emission probability by multiplying the smoothing parameter with the vocabulary instead of the set of states.

You will use smoothing as defined below:

𝑃(𝑤𝑖|𝑡𝑖)=𝐶(𝑡𝑖,𝑤𝑜𝑟𝑑𝑖)+𝛼 / 𝐶(𝑡𝑖)+𝛼∗𝑁

𝐶(𝑡𝑖,𝑤𝑜𝑟𝑑𝑖) is the number of times 𝑤𝑜𝑟𝑑𝑖 was associated with 𝑡𝑎𝑔𝑖* in the training data (stored in emission_counts dictionary).
𝐶(𝑡𝑖) is the number of times 𝑡𝑎𝑔𝑖* was in the training data (stored in tag_counts dictionary).
𝑁 is the number of words in the vocabulary
𝛼 is a smoothing parameter.

bwegge · July 21, 2024, 11:11am

Aneesh is right, the smoothing parameter should be multiplied by V, the size of the vocabulary, in the denominator (same as the sum index j is going up to V). Otherwise, the emission probabilities (row sums) wouldn’t add up to 1.
Also, it seems there should be some effort put into fixing and making the notation consistent between the assignments and lecture videos/notes. (I’m somewhat surprised that this hasn’t been fixed for over 2 years, given that this is both a paid and a massive OOC reaching thousands of students).

TMosh · August 5, 2024, 7:27pm

@lucas.coutinho, can you investigate the issue reported here?

Topic		Replies	Views
Formula for Emission Matrix NLP with Probabilistic Models week-2	5	670	November 21, 2024
Populating Transition Matrix: Smoothing NLP with Probabilistic Models week-2	2	416	July 25, 2023
Problem with emission matrix smooth formula in UNQ_C4 NLP with Probabilistic Models week-2	1	575	July 12, 2022
Emission Counts and Probability Matrix NLP with Probabilistic Models week-2	1	501	April 12, 2023
Smoothed probability for Naive Bayes NLP with Classification and Vector Spaces week-2 , week-3	1	510	November 4, 2022

Possible Issue with Emission Matrix Formula

Related topics