It seems like the following equation for perplexity for Bigram model is doing per-sentence instead of per-word perplexity:
where m is no. of sentences in test set W. |s_i| is the number of words in sentence i. Finally, w_j^{(i)} is j^{th} word in sentence i.
It would make sense to me if the -1/m in the above equation was actually N or \sum_{i=1}^{j}{|s_i|} , ie. sum of all words in all sentences.
If so, when I said:
it would mean:
i = m=1
|s_i| = |s_1| = N
Which gives:
(NOTE: the lab denotes perplexity and probability both as P() which is super confusing.)
Anyway, your original query was about this above equation, which is correct and is derived from cross-entopy of the model for probability distribution P. I will not go into the proof but will direct you to this section 3.3 of this link: https://web.stanford.edu/~jurafsky/slp3/3.pdf
I would also take the suggestion to the content creator of the slides and lab to give clarity to this equation.
Thanks!