Test data and perplexity

Hi,
I am working on this week’s assignment: autocomplete. Till the bottom parts, it seems we haven’t really used the perplexity. And we haven’t really used test_data either. I guess when we find the perplexity as about 4 that’s for a toy model test sentence:
image
However, we haven’t done a test on our test data for perplexity, am I right or I am missing something here?

Hi @Fei_Li

Why would we have used it earlier? Perplexity is a measurement for model evaluation (eg. how well we trained our model).

We have used it in training (unless you didn’t use them :slight_smile: ) in # UNQ_C4 :
bare_eval_generator = ...
where we use eval_lines (declared earlier at the top of the notebook: eval_lines = lines[-1000:] # Create a holdout validation set)

Yes, we did not measure perplexity on the test data (we do it in # UNQ_C6 only on a single batch of training data).

Cheers

Hi Mentor @arvyzukai , I understand all others except this part. Mine UNQ_C4 is def count_words(tokenized_sentences)
I think I miss placed this thread. I will move it to C2W3. Does this make you confused? Sorry about that. Would you please take a look again? Thank you very much.

Hi @Fei_Li

Yes, you are correct, the wrong topic category put me off rails and my response was not for C2W3.

You are correct and to be fair, it’s strange :+1: I haven’t noticed it previously :slight_smile: The test sentence in calculating perplexity is just some random sentence.

Cheers