Adding start sequence tags and perplexity calculation

Hi @Ritu_Pande

Regarding point 1, there’s an explanation:

Take a look to the ('<s>', '<s>') element in the bi-gram dictionary. Although for a bi-gram you will only require one starting mark, as in the element ('<s>', 'i'), this ('<s>', '<s>') element will be helpful when computing the probabilities using tri-grams (the corresponding count will be used as denominator).

In simple words - the reason is practical purposes for the implementation of the whole assignment.

Regarding point 2, I’m not sure why, but if I had to guess - to not overly complicate the assignment.

In general, perplexity scores should not include start tokens or any other special tokens but accounting for that might be too complicated for learners to complete the assignment.

Cheers