Adding start sequence tags and perplexity calculation

Ritu_Pande · August 7, 2023, 9:11pm

I completed my assignment for week 3 but I could not understand why there were following differences between the lecture and assignments.

UNQ_C8 GRADED FUNCTION: count_n_grams adds start tokens equal to length of n-grams instead of n-1 start tokens as mentioned in the lecture. The reason given in comments is not clear to me.
UNQ_C10 GRADED FUNCTION: calculate_perplexity calculates N = len(sentence) which includes start_token and end token. However, the lecture says that N should not include the start token. So why does assignment not follow it?

arvyzukai · August 21, 2023, 1:09pm

Hi @Ritu_Pande

Regarding point 1, there’s an explanation:

Take a look to the ('<s>', '<s>') element in the bi-gram dictionary. Although for a bi-gram you will only require one starting mark, as in the element ('<s>', 'i'), this ('<s>', '<s>') element will be helpful when computing the probabilities using tri-grams (the corresponding count will be used as denominator).

In simple words - the reason is practical purposes for the implementation of the whole assignment.

Regarding point 2, I’m not sure why, but if I had to guess - to not overly complicate the assignment.

In general, perplexity scores should not include start tokens or any other special tokens but accounting for that might be too complicated for learners to complete the assignment.

Cheers

Ritu_Pande · August 21, 2023, 1:51pm

Thank you for the explanations

Ritu_Pande · August 22, 2023, 6:53pm

Hi arvyzukai, I am not sure if it is possible, but if possible, can you provide feedback to the course creators to give a rationale in comments for the assignment regarding point 2. It had me question my understanding of perplexity calculations and might have same effect of the other learners as well.

arvyzukai · August 23, 2023, 7:52am

Hi @Ritu_Pande

Yes, sure, I thing there should not be a problem to include a sentence mentioning that. Excellent questions by the way!

Cheers

Topic		Replies	Views
C2_Assignment_EX10_perplexity NLP with Probabilistic Models week-3	1	235	March 26, 2024
Problem with Week 3 Exercise 10 NLP with Probabilistic Models week-3	11	657	September 29, 2022
Q10 - calculate_perplexity NLP with Probabilistic Models week-3	8	748	December 29, 2022
W3 Quiz Q5 answer wrong? NLP with Probabilistic Models week-3	4	560	July 25, 2023
C2_W3_Assignment - UNQ_C10 - calculate_perplexity() NLP with Probabilistic Models week-3	12	669	September 28, 2023

Adding start sequence tags and perplexity calculation

Related topics