quick question because the quiz confused me, so when calculating perplexity e.g. for this test data: <s> Mary likes cats </s>
, m is 4 or 5?
Hi @ArminJ
The sentence you mentioned has 5
words and 4
bigrams. Make sure that you use the correct one in your calculations.
my understanding is that in the formula of perplexity, m is the number of words in the test data not the bigrams. am I wrong?
Yes, you are correct! m represents the total number of words in the test data.
so I think the answer to the question was incorrect in the quiz, you might wanna check. The question also pops up in the videos
Please take a look at this thread:
ah ok, so not counting <s>
, since we are sure, we are not getting it. got it.
Yes, that is right!