Hi, The formula of the perplexity in the Lab is not the same as the formula discussed during course 2 of the specialization.
This is the formula that we have in the second course( same as wikipedia)
The one in the Lab is
where we have the same number N
1 Like
Hello @Ibrahim_RIDENE
Kindly share the wikipedia link you are referring remember, so a proper explanation can be provided after reviewing and comparing both lab and wikipedia information as stated by you.
Regards
DP
1 Like
Hi @Ibrahim_RIDENE ,
Both formulaโs are correct. The second one is derived from the first in case of bi-gram model, with the condition that all the sentences in the test set gets concatenated.
This is talked about in Language Model Evaluation chapter of Autocomplete at timestamp 5:00
1 Like
Hi,
Please find the link to the wikipedia formula: Perplexity - Wikipedia.
@jyadav202 , I would like to mention that even the formula that you provided is not totally correct following this explanation
- The m of the root is not the same as the m of the first product inside the root. The first one must be the number of the distinct words in the corpus, the second one is the number of sequences in the corpus
1 Like
It seems like the following equation for perplexity for Bigram model is doing per-sentence instead of per-word perplexity:
PP(W) = \sqrt[m]{\prod_{i=1}^{m}\prod_{j=1}^{|s_i|} \frac{1}{P(w_j^{(i)}| w_{j-1}^{(i)})}}
where m is no. of sentences in test set W. |s_i| is the number of words in sentence i. Finally, w_j^{(i)} is j^{th} word in sentence i.
It would make sense to me if the -1/m in the above equation was actually N or
\sum_{i=1}^{j}{|s_i|} , ie. sum of all words in all sentences.
If so, when I said:
it would mean:
i = m=1
|s_i| = |s_1| = N
Which gives:
PP(W) = \sqrt[N]{\prod_{i=1}^{N} \frac{1}{P(w_i| w_{i-1})}}
(NOTE: the lab denotes perplexity and probability both as P() which is super confusing.)
Anyway, your original query was about this above equation, which is correct and is derived from cross-entopy of the model for probability distribution P. I will not go into the proof but will direct you to this section 3.3 of this link: https://web.stanford.edu/~jurafsky/slp3/3.pdf
I would also take the suggestion to the content creator of the slides and lab to give clarity to this equation.
Thanks!
1 Like
Thanks for the clarification ! Really appeciate it
2 Likes