Formula for Emission Matrix

Hi all,

I believe there is a mistake in the formula for Populating the Emission Matrix

In the denominator it should be C(t_i) + V *\epsilon as we are summing over all the words w_i given t_i.

Or am I misunderstanding something?

Best,
Thomas

1 Like

Hi @skyrockets_21

For consistency of notation, yes, here instead of N it should be V. The same goes for code hints in the assignment (see this post)

Usually N is associated with counts (natural numbers) regardless of the things that are being counted so I guess we could not call it a mistake, but for clarity it could have been V.

2 Likes

Thank you for the clarification @arvyzukai !

Thank you for note. I was going to ask the same question. In my opinion, the video lecture should have clarification pop-up on this matter. N and V are not the same: N is number of states (tags), V is number of words in vocabulary. When sum over row, probs should add up to 1 (quiz question), not the case with +N*epsiolon.
Thank you

Should the C(ti, wi) be C(ti, wj) instead? I don’t think the previous can cover all tag-word combinations.

Hi @dweile,
Excellent question! I believe that videos follow the notation of https://web.stanford.edu/~jurafsky/slp3/17.pdf

Given labeled corpus, where each word w_i associated with label t_i,
the emission probabilities P(w_i | t_i) = {C(t_i, w_i) \over C(t_i)} represent the probability, given a tag t_i that it will be associated with a given word w_i for all i. When we populate the matrix of emission probabilities B = (b_{ij}) we compute b_{ij} = {C(t_i, w_j) + \alpha \over C(t_i) + \alpha V} for all i, j: 1 \le i \le N, 1 \le j \le V.

1 Like