Q10 - calculate_perplexity

I don’t what i’m doing wrong here. I saw the reading and lab lecture but it seems right to me. Someone could help?

I haven’t gotten to NLP C2 yet, but what is the definition of an “n-gram”? It’s a sequence of words, isn’t it? But it looks like you’re setting the variable n_gram to a single word in your logic. Sure 1 word is a trivial sequence, but I’m gonna go out on a limb here and guess there’s a reason why the t value starts at n in the loop there. :thinking:

I’m also a little worried about what they mean in that comment before the loop where they say that the loop is “inclusive on both ends”. Try running the following loop in python and watch what happens:

for ii in range(2,7):
    print(f"ii = {ii}")

As I said earlier, I have not worked this assignment, so maybe I’m not helping at all here. Just asking questions :nerd_face:

2 Likes

Yeah, like Paul said, I think you’re having an issue with your range and with subsetting the n_gram.

2 Likes

ops, you guys are right. thankss

1 Like

@joaopedrovtp @davidguo94
I’m stumped on this one too. But, I’m getting very different perplexity scores. I feel like I’m subsetting the previous n_gram incorrrectly, but I can’t seem figure out the right way. Any help?

    # Index t ranges from n to N - 1, inclusive on both ends
    for t in range(n, N):

        # get the n-gram preceding the word at position t
        n_gram = sentence[t-n]
        
        # get the word at position t
        word = sentence[t]
        
        # Estimate the probability of the word given the n-gram
        # using the n-gram counts, n-plus1-gram counts,
        # vocabulary size, and smoothing constant
        probability = estimate_probability(word, n_gram, n_gram_counts, n_plus1_gram_counts, vocabulary_size,k)
        
        # Update the product of the probabilities
        # This 'product_pi' is a cumulative product 
        # of the (1/P) factors that are calculated in the loop
        product_pi *= 1/probability
        
        ### END CODE HERE ###

    # Take the Nth root of the product
    perplexity = (product_pi)**(1/N)

For my output I get:

Perplexity for first train sample: 4.0032
Perplexity for test sample: 4.8076

But the correct answer should be:

Perplexity for first train sample: 2.8040
Perplexity for test sample: 3.9654

Help? Am I missing something really simple :frowning:

I solved my subsetting issue … no need to respond :slight_smile:

what value did you assign there?

The point is that the n-gram is a sequence of words, so the index value that you use to “slice” the list sentence needs to have a start and an end value to define the range. In both the incorrect code samples shown earlier on this thread, you can see that the index specified will select just a single word.

1 Like

Thank you! That was a really helpful response for identifying what the problem was with my code.