Formula for Emission Matrix

skyrockets_21 · July 14, 2022, 12:49am

Hi all,

I believe there is a mistake in the formula for Populating the Emission Matrix

In the denominator it should be C(t_i) + V *\epsilon as we are summing over all the words w_i given t_i.

Or am I misunderstanding something?

Best,
Thomas

arvyzukai · July 14, 2022, 6:21am

For consistency of notation, yes, here instead of N it should be V. The same goes for code hints in the assignment (see this post)

Usually N is associated with counts (natural numbers) regardless of the things that are being counted so I guess we could not call it a mistake, but for clarity it could have been V.

skyrockets_21 · July 14, 2022, 6:26am

Thank you for the clarification @arvyzukai !

Dennis_Sinitsky · March 5, 2024, 2:17am

Thank you for note. I was going to ask the same question. In my opinion, the video lecture should have clarification pop-up on this matter. N and V are not the same: N is number of states (tags), V is number of words in vocabulary. When sum over row, probs should add up to 1 (quiz question), not the case with +N*epsiolon.
Thank you

dweile · November 20, 2024, 5:12pm

Should the C(ti, wi) be C(ti, wj) instead? I don’t think the previous can cover all tag-word combinations.

conscell · November 21, 2024, 2:21am

Hi @dweile,
Excellent question! I believe that videos follow the notation of https://web.stanford.edu/~jurafsky/slp3/17.pdf

Given labeled corpus, where each word w_i associated with label t_i,
the emission probabilities P(w_i | t_i) = {C(t_i, w_i) \over C(t_i)} represent the probability, given a tag t_i that it will be associated with a given word w_i for all i. When we populate the matrix of emission probabilities B = (b_{ij}) we compute b_{ij} = {C(t_i, w_j) + \alpha \over C(t_i) + \alpha V} for all i, j: 1 \le i \le N, 1 \le j \le V.

Topic		Replies	Views
Possible Issue with Emission Matrix Formula NLP with Probabilistic Models week-2	4	552	August 5, 2024
Emission Counts and Probability Matrix NLP with Probabilistic Models week-2	1	501	April 12, 2023
Problem with emission matrix smooth formula in UNQ_C4 NLP with Probabilistic Models week-2	1	577	July 12, 2022
Small Typos in C2 W2 Assignment NLP with Probabilistic Models week-2	1	594	December 18, 2022
C2_W2 #UNQ_C4 create_emission_matrix error with text as Numbers NLP with Probabilistic Models week-2	1	10	March 27, 2025

Formula for Emission Matrix

Related topics