In the perplexity lecture [Language Model Evaluation], is the parameter “m” the number of words in the test set W or the number of sentences in the test set W.
The slide is very contradicting where is use m in the subscript of s, but labels it as the number of words.
Yes, the slide is unfortunately contradicting (probably because of a typo). The s should have been subscripted with _i like (s_1, s_2, ... , s_i)
I had that same question. Typo is still there as of July 2023.
In that lecture, there is a contradiction also in a following slide: notice below that in the first equation m is the number of sentences, while in the second it is the total number of words in all sentences.