Emission Counts and Probability Matrix

nmurugesh · April 11, 2023, 1:08pm

In the lecture, the word ‘emission’ in emission probability’ gives an intuition that the probability of next word being word1 given the pos tag of previous word word0 is posx i.e. probability of a word being word1 given that the pos tag of previous word is posx. The intuition is that verb is preceded by noun or sentence does not start with verb like ‘is’ etc. Such grammatical probabilities can improve the prediction. However, in the lab, while calculating the emission counts, it is simply calculating the occurrence of a particular word in different POS instead of a word being preceded by a particular pos tag.

For example, the wall street corpus consists of sentences and pos tags for each word in that sentence and not a random list of different words. Considering the example ;
A sample of the test corpus
[‘The\tDT\n’, ‘economy\tNN\n’, “'s\tPOS\n”, ‘temperature\tNN\n’, ‘will\tMD\n’, ‘be\tVB\n’, ‘taken\tVBN\n’, ‘from\tIN\n’, ‘several\tJJ\n’, ‘vantage\tNN\n’]

the example implies the significance of a sequence of words - the pos tag DT of word ‘The’ precedes the word ‘economy’ etc.

But this is not taken into account at all while calculating the emission counts and probability. Suppose if I construct a emission counts matrix that gives the probability of a word being preceded by the previous pos tag as per sample corpus, will it not improve the accuracy?

Is my intuition correct or will it lead to error? (perhaps the transition matrix takes care of the sequence of tags irrespective of the word - but even if so, then the word ‘emission’ is misleading!!!)

Elemento · April 12, 2023, 7:39am

Hey @nmurugesh,
I guess there is a small misinterpretation in your understanding of the emission probabilities. The emission probability say, B(NN, “going”) defines the probability of the current word = “going”, given that the tag for the current word is “NN” or Noun. There is no aspect of the previous word or the previous tag involved in the emission probabilities. I hope this resolves your query.

Cheers,
Elemento

Topic		Replies	Views
Formula for Emission Matrix NLP with Probabilistic Models week-2	5	673	November 21, 2024
Possible Issue with Emission Matrix Formula NLP with Probabilistic Models week-2	4	552	August 5, 2024
Small Typos in C2 W2 Assignment NLP with Probabilistic Models week-2	1	594	December 18, 2022
C2_W2 create_transition_matrix (Errors) NLP with Probabilistic Models week-2	10	576	August 16, 2023
NLP with Probabilistic models - C2_W2_Assignment NLP with Probabilistic Models week-1	1	552	November 4, 2022

Emission Counts and Probability Matrix

Related topics