Lab : Calculating perplexity

It seems like the following equation for perplexity for Bigram model is doing per-sentence instead of per-word perplexity:

PP(W) = \sqrt[m]{\prod_{i=1}^{m}\prod_{j=1}^{|s_i|} \frac{1}{P(w_j^{(i)}| w_{j-1}^{(i)})}}

where m is no. of sentences in test set W. |s_i| is the number of words in sentence i. Finally, w_j^{(i)} is j^{th} word in sentence i.

It would make sense to me if the -1/m in the above equation was actually N or \sum_{i=1}^{j}{|s_i|} , ie. sum of all words in all sentences.

If so, when I said:

it would mean:

i = m=1
|s_i| = |s_1| = N

Which gives:

PP(W) = \sqrt[N]{\prod_{i=1}^{N} \frac{1}{P(w_i| w_{i-1})}}

(NOTE: the lab denotes perplexity and probability both as P() which is super confusing.)

Anyway, your original query was about this above equation, which is correct and is derived from cross-entopy of the model for probability distribution P. I will not go into the proof but will direct you to this section 3.3 of this link: https://web.stanford.edu/~jurafsky/slp3/3.pdf

I would also take the suggestion to the content creator of the slides and lab to give clarity to this equation.

Thanks!

1 Like