In the lecture, the word ‘emission’ in emission probability’ gives an intuition that the probability of next word being word1 given the pos tag of previous word word0 is posx i.e. probability of a word being word1 given that the pos tag of previous word is posx. The intuition is that verb is preceded by noun or sentence does not start with verb like ‘is’ etc. Such grammatical probabilities can improve the prediction. However, in the lab, while calculating the emission counts, it is simply calculating the occurrence of a particular word in different POS instead of a word being preceded by a particular pos tag.
For example, the wall street corpus consists of sentences and pos tags for each word in that sentence and not a random list of different words. Considering the example ;
A sample of the test corpus
[‘The\tDT\n’, ‘economy\tNN\n’, “'s\tPOS\n”, ‘temperature\tNN\n’, ‘will\tMD\n’, ‘be\tVB\n’, ‘taken\tVBN\n’, ‘from\tIN\n’, ‘several\tJJ\n’, ‘vantage\tNN\n’]
- the example implies the significance of a sequence of words - the pos tag DT of word ‘The’ precedes the word ‘economy’ etc.
But this is not taken into account at all while calculating the emission counts and probability. Suppose if I construct a emission counts matrix that gives the probability of a word being preceded by the previous pos tag as per sample corpus, will it not improve the accuracy?
Is my intuition correct or will it lead to error? (perhaps the transition matrix takes care of the sequence of tags irrespective of the word - but even if so, then the word ‘emission’ is misleading!!!)