Q10 - calculate_perplexity

joaopedrovtp · October 25, 2022, 10:28pm

I don’t what i’m doing wrong here. I saw the reading and lab lecture but it seems right to me. Someone could help?

paulinpaloalto · October 25, 2022, 11:03pm

I haven’t gotten to NLP C2 yet, but what is the definition of an “n-gram”? It’s a sequence of words, isn’t it? But it looks like you’re setting the variable n_gram to a single word in your logic. Sure 1 word is a trivial sequence, but I’m gonna go out on a limb here and guess there’s a reason why the t value starts at n in the loop there.

I’m also a little worried about what they mean in that comment before the loop where they say that the loop is “inclusive on both ends”. Try running the following loop in python and watch what happens:

for ii in range(2,7):
    print(f"ii = {ii}")

As I said earlier, I have not worked this assignment, so maybe I’m not helping at all here. Just asking questions …

davidguo94 · October 26, 2022, 5:29am

Yeah, like Paul said, I think you’re having an issue with your range and with subsetting the n_gram.

joaopedrovtp · October 26, 2022, 12:10pm

ops, you guys are right. thankss

wdduncan · November 22, 2022, 9:55pm

@joaopedrovtp @davidguo94
I’m stumped on this one too. But, I’m getting very different perplexity scores. I feel like I’m subsetting the previous n_gram incorrrectly, but I can’t seem figure out the right way. Any help?

    # Index t ranges from n to N - 1, inclusive on both ends
    for t in range(n, N):

        # get the n-gram preceding the word at position t
        n_gram = sentence[t-n]
        
        # get the word at position t
        word = sentence[t]
        
        # Estimate the probability of the word given the n-gram
        # using the n-gram counts, n-plus1-gram counts,
        # vocabulary size, and smoothing constant
        probability = estimate_probability(word, n_gram, n_gram_counts, n_plus1_gram_counts, vocabulary_size,k)
        
        # Update the product of the probabilities
        # This 'product_pi' is a cumulative product 
        # of the (1/P) factors that are calculated in the loop
        product_pi *= 1/probability
        
        ### END CODE HERE ###

    # Take the Nth root of the product
    perplexity = (product_pi)**(1/N)

For my output I get:

Perplexity for first train sample: 4.0032
Perplexity for test sample: 4.8076

But the correct answer should be:

Perplexity for first train sample: 2.8040
Perplexity for test sample: 3.9654

Help? Am I missing something really simple

wdduncan · November 22, 2022, 11:14pm

I solved my subsetting issue … no need to respond

Upendra_kumar · December 13, 2022, 10:29am

what value did you assign there?

paulinpaloalto · December 13, 2022, 3:42pm

The point is that the n-gram is a sequence of words, so the index value that you use to “slice” the list sentence needs to have a start and an end value to define the range. In both the incorrect code samples shown earlier on this thread, you can see that the index specified will select just a single word.

MARY_GLANTZ · December 29, 2022, 4:57pm

Thank you! That was a really helpful response for identifying what the problem was with my code.

Topic		Replies	Views
C2_W3_Assignment - UNQ_C10 - calculate_perplexity() NLP with Probabilistic Models week-3	12	659	September 28, 2023
C2_W3_E10_perplexity NLP with Probabilistic Models week-3	5	306	March 18, 2024
C2_W3 assigment calculate perplexity problem NLP with Probabilistic Models week-3	2	540	June 28, 2023
Adding start sequence tags and perplexity calculation NLP with Probabilistic Models week-3	4	545	August 23, 2023
Q10 calculate_perplexity NLP with Probabilistic Models week-3	5	621	January 8, 2023

Q10 - calculate_perplexity

Related topics